summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc4396.txt
diff options
context:
space:
mode:
authorThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committerThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
treee3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc4396.txt
parentea76e11061bda059ae9f9ad130a9895cc85607db (diff)
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc4396.txt')
-rw-r--r--doc/rfc/rfc4396.txt3699
1 files changed, 3699 insertions, 0 deletions
diff --git a/doc/rfc/rfc4396.txt b/doc/rfc/rfc4396.txt
new file mode 100644
index 0000000..be4f173
--- /dev/null
+++ b/doc/rfc/rfc4396.txt
@@ -0,0 +1,3699 @@
+
+
+
+
+
+
+Network Working Group J. Rey
+Request for Comments: 4396 Y. Matsui
+Category: Standards Track Panasonic
+ February 2006
+
+
+ RTP Payload Format
+ for 3rd Generation Partnership Project (3GPP) Timed Text
+
+Status of This Memo
+
+ This document specifies an Internet standards track protocol for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "Internet
+ Official Protocol Standards" (STD 1) for the standardization state
+ and status of this protocol. Distribution of this memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (2006).
+
+Abstract
+
+ This document specifies an RTP payload format for the transmission of
+ 3GPP (3rd Generation Partnership Project) timed text. 3GPP timed
+ text is a time-lined, decorated text media format with defined
+ storage in a 3GP file. Timed Text can be synchronized with
+ audio/video contents and used in applications such as captioning,
+ titling, and multimedia presentations. In the following sections,
+ the problems of streaming timed text are addressed, and a payload
+ format for streaming 3GPP timed text over RTP is specified.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 1]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+Table of Contents
+
+ 1. Introduction ....................................................3
+ 2. Motivation, Requirements, and Design Rationale ..................3
+ 2.1. Motivation .................................................3
+ 2.2. Basic Components of the 3GPP Timed Text Media Format .......4
+ 2.3. Requirements ...............................................5
+ 2.4. Limitations ................................................6
+ 2.5. Design Rationale ...........................................7
+ 3. Terminology ....................................................10
+ 4. RTP Payload Format for 3GPP Timed Text .........................12
+ 4.1. Payload Header Definitions ................................13
+ 4.1.1. Common Payload Header Fields .......................15
+ 4.1.2. TYPE 1 Header ......................................17
+ 4.1.3. TYPE 2 Header ......................................20
+ 4.1.4. TYPE 3 Header ......................................23
+ 4.1.5. TYPE 4 Header ......................................24
+ 4.1.6. TYPE 5 Header ......................................25
+ 4.2. Buffering of Sample Descriptions ..........................25
+ 4.2.1. Dynamic SIDX Wraparound Mechanism ..................26
+ 4.3. Finding Payload Header Values in 3GP Files ................28
+ 4.4. Fragmentation of Timed Text Samples .......................31
+ 4.5. Reassembling Text Samples at the Receiver .................33
+ 4.6. On Aggregate Payloads .....................................35
+ 4.7. Payload Examples ..........................................39
+ 4.8. Relation to RFC 3640 ......................................43
+ 4.9. Relation to RFC 2793 ......................................44
+ 5. Resilient Transport ............................................45
+ 6. Congestion Control .............................................46
+ 7. Scene Description ..............................................47
+ 7.1. Text Rendering Position and Composition ...................47
+ 7.2. SMIL Usage ................................................48
+ 7.3. Finding Layout Values in a 3GP File .......................48
+ 8. 3GPP Timed Text Media Type .....................................49
+ 9. SDP Usage ......................................................53
+ 9.1. Mapping to SDP ............................................53
+ 9.2. Parameter Usage in the SDP Offer/Answer Model .............53
+ 9.2.1. Unicast Usage ......................................54
+ 9.2.2. Multicast Usage ....................................57
+ 9.3. Offer/Answer Examples .....................................58
+ 9.4. Parameter Usage outside of Offer/Answer ...................60
+ 10. IANA Considerations ...........................................60
+ 11. Security Considerations .......................................60
+ 12. References ....................................................61
+ 12.1. Normative References .....................................61
+ 12.2. Informative References ...................................61
+ 13. Basics of the 3GP File Structure ..............................64
+ 14. Acknowledgements ..............................................65
+
+
+
+Rey & Matsui Standards Track [Page 2]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+1. Introduction
+
+ 3GPP timed text is a media format for time-lined, decorated text
+ specified in the 3GPP Technical Specification TS 26.245, "Transparent
+ end-to-end packet switched streaming service (PSS); Timed Text Format
+ (Release 6)" [1]. Besides plain text, the 3GPP timed text format
+ allows the creation of decorated text such as that for karaoke
+ applications, scrolling text for newscasts, or hyperlinked text.
+ These contents may or may not be synchronized with other media, such
+ as audio or video.
+
+ The purpose of this document is to provide a means to stream 3GPP
+ timed text contents using RTP [3]. This includes the streaming of
+ timed text being read out of a (3GP) file, as well as the streaming
+ of timed text generated in real-time, a.k.a. live streaming.
+
+ Section 2 contains the motivation for this document, an overview of
+ the media format, the requirements, and the design rationale.
+ Section 3 defines the terminology used. Section 4 specifies the
+ payload headers, the fragmentation and re-assembly rules for text
+ samples, the rules for payload aggregation, and the relations of this
+ document to RFC 3640 [12] and RFC 2793 [22]. Section 5 specifies
+ some simple schemes for resilient transport and gives pointers to
+ other possible mechanisms. Section 6 addresses congestion control.
+ Section 7 specifies scene description. Section 8 defines the media
+ type. Section 9 specifies SDP for unicast and multicast sessions,
+ including usage in the Offer/Answer model [13]. Sections 10 and 11
+ address IANA and security considerations. Section 12 lists
+ references. Basics of the 3GP File Structure are in Section 13.
+
+2. Motivation, Requirements, and Design Rationale
+
+2.1. Motivation
+
+ The 3GPP timed text format was developed for use in the services
+ specified in the 3GPP Transparent End-to-end Packet-switched
+ Streaming Services (3GPP PSS) specification [16].
+
+ As of today, PSS allows downloading 3GPP timed text contents stored
+ in 3GP files. However, due to the lack of a RTP payload format, it
+ is not possible to stream 3GPP timed text contents over RTP.
+
+ This document specifies such a payload format.
+
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 3]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+2.2. Basic Components of the 3GPP Timed Text Media Format
+
+ Before going into the details of the design, it is necessary to know
+ how the media format is constructed. We can identify four
+ differentiated functional components: layout information, default
+ formatting, text strings, and decoration. In the following, we
+ shortly explain these and match them to their designations in a 3GP
+ file:
+
+ o Initial spatial layout information related to the text
+ strings: These are the height and width of the text region
+ where text is displayed, the position of the text region in
+ the display, and the layer or proximity of the text to the
+ user. In 3GP files, this information is contained in the
+ Track Header Box (3GP file designations are capitalized for
+ clarity).
+
+ o Default settings for formatting and positioning of text: style
+ (font, size, color,...), background color, horizontal and
+ vertical justification, line width, scrolling, etc. For 3GP
+ files, this corresponds to the Sample Descriptions.
+
+ o The actual text strings: encoded characters using either UTF-8
+ [18] or UTF-16 [19] encoding.
+
+ o The decoration: If some characters have different style,
+ delay, blink, etc., this needs to be indicated. The
+ decoration is only present in the text samples if it is
+ actually needed. Otherwise, the default settings as above
+ apply. In 3GP files, within each Text Sample, the decoration
+ (i.e., Modifier Boxes) is appended to the text strings, if
+ needed. At the time of writing this payload format, the
+ following modifiers are specified in the 3GPP timed text media
+ format specification [1]:
+
+ - text highlight
+ - highlight color
+ - blinking text
+ - karaoke feature
+ - hyperlink
+ - text delay
+ - text style
+ - positioning of the text box
+ - text wrap indication
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 4]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+2.3. Requirements
+
+ Once the basic components are known, it is necessary to define which
+ requirements the payload format shall fulfill:
+
+ 1. It shall enable both live streaming and streaming from a 3GP
+ file.
+
+ Informative note: For the purpose of this document, the
+ term "live streaming" refers to those scenarios where
+ the timed text stream is sent from a live encoder. Upon
+ reception, the content may or may not be stored in a 3GP
+ file. Typically, in live streaming applications, the
+ sender encapsulates the timed text content in RTP
+ packets following the guidelines given in this document.
+ At the receiving side, a buffer is used to cancel the
+ network delay and delay jitter. If receiver and sender
+ support packet loss resilience mechanisms (see Section
+ 5), it may also be possible to recover from packet
+ losses. Note that how sender and receiver actually
+ manage and dimension the buffers is an implementation
+ design choice.
+
+ 2. Furthermore, it shall be possible for an RTP receiver using this
+ payload format, and capable of storing in 3GP format, to obtain
+ all necessary information from the RTP packets for storing the
+ received text contents according to the 3GP file format. This
+ file may or may not be the same as the original file.
+
+ Informative note: The 3GP file format itself is based on
+ the ISO Base Media File Format recommendation [2].
+ Section 13.1 gives some insight into the 3GP file
+ structure. Further, Sections 4.3 and 7.3 specify where
+ the information needed for filling in payload headers is
+ found in a 3GP file. For live streaming, appropriate
+ values complying with the format and units described in
+ [1] shall be used. Where needed, clarifications on
+ appropriate values are given in this document.
+
+ 3. It shall enable efficient and resilient transport of timed text
+ contents over RTP. In particular:
+
+ a. Enable the transmission of the sample descriptions by both
+ out-of-band and in-band means. Sample descriptions are
+ important information, which potentially apply to several
+ text samples. These default formatting settings are
+ typically transmitted out-of-band (reliably) once at the
+ initialization phase. If additional sample descriptions
+
+
+
+Rey & Matsui Standards Track [Page 5]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ are needed in the course of a session, these may also be
+ sent out-of-band or in-band. In-band transmission,
+ although unreliable, may be more appropriate for sending
+ sample descriptions if these should be sent frequently, as
+ opposed to establishing an additional communication channel
+ for SDP, for example. It is also useful in cases where an
+ out-of-band channel may not be available and for live
+ streaming, where contents are not known a priori. Thus,
+ the payload format shall enable out-of-band and in-band
+ transmission of sample descriptions. Section 4.1.6
+ specifies a payload header for transmitting sample
+ descriptions in-band. Section 9 specifies how sample
+ descriptions are mapped to SDP.
+
+ b. Enable the fragmentation of a text sample into several RTP
+ packets in order to cover a wide range of applications and
+ network environments. In general, fragmentation should be
+ a rare event, given the low bit rates and relatively small
+ text sample sizes. However, the 3GPP Timed Text media
+ format does allow for larger text samples. Therefore, the
+ payload format shall take this into account and provide a
+ means for coping with fragmentation and reassembly. Section
+ 4.4 deals with fragmentation.
+
+ c. Enable the aggregation of units into an RTP packet for
+ making the transport more efficient. In a mobile
+ communication environment, a typical text sample size is
+ around 100-200 bytes. If the available bit rate and the
+ packet size allow it, units should be aggregated into one
+ RTP packet. Section 4.6 deals with aggregation.
+
+ d. Enable the use of resilient transport mechanisms, such as
+ repetition, retransmission [11], and FEC [7] (see Section
+ 5). For a more general discussion, refer to RFC 2354 [8],
+ which discusses available mechanisms for stream repair.
+
+2.4. Limitations
+
+ The payload headers have been optimized in size for RTP. Instead
+ of using 32-bit (S)LEN, SDUR, and SIDX header fields, which would
+ carry many unused bits much of the time, it has been a design
+ choice to reduce the size of these fields. As a consequence, this
+ payload format has reduced maximum values with respect to sizes and
+ durations of (text) samples and sample descriptions. These maximum
+ values differ from those allowed in 3GP files, where they are
+ expressed using 32-bit (unsigned) integers. In some cases,
+
+
+
+
+
+Rey & Matsui Standards Track [Page 6]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ extension mechanisms are provided to deal with larger values.
+ However, it is noted that the values used here should be enough for
+ the streaming applications targeted.
+
+ The following limitations apply:
+
+ 1. The maximum size of text samples carried in RTP packets is
+ restricted to be a 16-bit (unsigned) integer (this includes the
+ text strings and modifiers). This means a maximum size for the
+ unit would be about 64 Kbytes. No extension mechanism is
+ provided.
+
+ 2. The sample description index values are restricted to be an 8-
+ bit (unsigned) integer. An extension mechanism is given in
+ Section 4.3.
+
+ 3. The text sample duration is restricted to be a 24-bit (unsigned)
+ integer. This yields a maximum duration at a timestamp
+ clockrate of 1000 Hz of about 4.6 hours. Nevertheless, an
+ extension mechanism is provided in Section 4.3.
+
+ 4. Sample descriptions are also restricted in size: If the size
+ cannot be expressed as a 16-bit (unsigned) integer, the sample
+ description shall not be conveyed. As in the case of the sample
+ size, no extension mechanism is provided.
+
+ 5. A further limitation concerns the UTF-16 encodings supported:
+ Only transport of text strings following big endian byte order
+ is supported. See Section 4.1.1 for details.
+
+2.5. Design Rationale
+
+ The following design choices were made:
+
+ 1. 'Unit' approach: The payload formats specified in this document
+ follow a simple scheme: a 3-byte common header (Common Payload
+ Header) followed by a specific header for each text sample
+ (fragment) type. Following these headers, the text sample
+ contents are placed (Section 4.1.1 and following). This
+ structure is called a 'unit'.
+
+ The following units have been devised to comply with the
+ requirements mentioned in Section 2.3:
+
+ a. A TYPE 1 unit that contains one complete text sample,
+
+ b. A TYPE 2 unit that contains a complete text string or a
+ fragment thereof,
+
+
+
+Rey & Matsui Standards Track [Page 7]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ c. A TYPE 3 unit that contains the complete modifiers or only
+ the first fragment thereof,
+
+ d. A TYPE 4 unit that contains one modifier fragment other
+ than the first, and
+
+ e. A TYPE 5 unit that contains one sample description.
+
+ This 'unit' approach was motivated by the following reasons:
+
+ 1. Allows a simple classification of the text samples and
+ text sample fragments that can be conveyed by the
+ payload format.
+
+ 2. Enables easy interoperability with RFC 3640 [12].
+ During the development of this payload format, interest
+ was shown from MPEG-4 standardization participants in
+ developing a common payload structure for the transport
+ of 3GPP Timed Text. While interoperability is not
+ strictly necessary for this payload format to work, it
+ has been pursued in this payload format. Section 4.8
+ explains how this is done.
+
+ 2. Character count is not implemented. This payload format does
+ detect lost text samples fragments, but it does not enable an
+ RTP receiver to find out the exact number of text characters
+ lost. In fact, the fragment size included in the payload
+ headers does not help in finding the number of lost characters
+ because the UTF-8/UTF-16 [18][19] encodings used yield a
+ variable number of bytes per character.
+
+ For finding the exact number of lost characters, an additional
+ field reflecting the character count (and possibly the character
+ offset) upon fragmentation would be required. This would
+ additionally require that the entity performing fragmentation
+ count the characters included in each text fragment.
+
+ One benefit of having a character count would be that the
+ display application would be able to replace missing characters
+ through some other character representing character loss. For
+ example:
+
+ If we take the "Some text is lost now" and assume the loss
+ of a packet containing the text in the middle, this could
+ be displayed (with a character count):
+
+ "Some ############now"
+
+
+
+
+Rey & Matsui Standards Track [Page 8]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ As opposed to:
+
+ "Some #now"
+
+ which is what this payload format enables ("#" indicates a
+ missing character or packet, respectively).
+
+ However, it is the consensus of the working group that for
+ applications such as subtitling applications and multimedia
+ presentations that use this payload format, such partial error
+ correction is not worth the cost of including two additional
+ fields; namely, character count and character offset. Instead,
+ it is recommended that some more overhead be invested to provide
+ full error correction by protecting the less text sample
+ fragments using the measures outlined in Section 5.
+
+ 3. Fragment re-assembly: In order to re-assemble the text samples,
+ offset information is needed. Instead of a character or byte
+ offset, a single byte, TOTAL/THIS, is used. These two values
+ indicate the total number and current index of fragments of a
+ text sample. This is simpler than having a character offset
+ field in each fragment. Details in Section 4.1.3.
+
+ 4. A length field, LEN, is present in the common header fields.
+ While the length in the RTP payload format is not needed by most
+ RTP applications (typically lower layers, like UDP, provide this
+ information), it does ease interoperability with RFC 3640. This
+ is because the Access Units (AUs) used for carriage of data in
+ RFC 3640 must include a length indication. Details are in
+ Section 4.8.
+
+ 5. The header fields in the specific payload headers (TYPE headers
+ in Sections 4.1.2 to 4.1.6) have been arranged for easy
+ processing on 32-bit machines. For this reason, the fields SIDX
+ and SDUR are swapped in TYPE 1 unit, compared to the other
+ units.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 9]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+3. Terminology
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in RFC 2119 [5].
+
+ Furthermore, the following terms are used and have specific meaning
+ within the context of this document:
+
+ text sample or whole text sample
+
+ In the 3GPP Timed Text media format [1], these terms refer to a
+ unit of timed text data as contained in the source (3GP) file.
+ This includes the text string byte count, possibly a Byte Order
+ Mark, the text string and any modifiers that may follow. Its
+ equivalent in audio/video would be a frame.
+
+ In this document, however, a text sample contains only text
+ strings followed by zero or more modifiers. This definition of
+ text sample excludes the 16-bit text string byte count and the
+ 16-bit Byte Order Mark (BOM) present in 3GP file text samples
+ (see Section 4.3 and Figure 9). The 16-bit BOM is not
+ transported in RTP, as explained in Section 4.1.1.
+
+ text strings
+
+ The actual text characters encoded either as UTF-8 or UTF-16.
+ When using this payload format, the text string does not contain
+ any byte order mark (BOM). See Figure 9 for details.
+
+ fragment or text sample fragment
+
+ A fraction of a text sample. A fragment may contain either text
+ strings or modifier (decoration) contents, but not both at the
+ same time.
+
+ sample contents
+
+ General term to identify timed text data transported when using
+ this payload format. Sample contents may be one or several text
+ samples, sample descriptions, and sample fragments (note that,
+ as per Section 4.6, there is only one case in which more than
+ one fragment may be included in a payload).
+
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 10]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ decoration or modifiers
+
+ These terms are used interchangeably throughout the document to
+ denote the contents of the text sample that modify the default
+ text formatting. Modifiers may, for example, specify different
+ font size for a particular sequence of characters or define
+ karaoke timing for the sample.
+
+ sample description
+
+ Information that is potentially shared by more than one text
+ sample. In a 3GP file, a sample description is stored in a
+ place where it can be shared. It contains setup and default
+ information such as scrolling direction, text box position,
+ delay value, default font, background color, etc.
+
+ units or transport units
+
+ The payload headers specified in this document encapsulate text
+ samples, fragments thereof, and sample descriptions by placing a
+ common header and specific payload header (Sections 4.1.1 to
+ 4.1.6) before them, thus building what is here called a
+ (transport) unit.
+
+ aggregation or aggregate packet
+
+ The payload of an aggregate (RTP) packet consists of several
+ (transport) units.
+
+ track or stream
+
+ 3GP files contain audio/video and text tracks. This document
+ enables streaming of text tracks using RTP. Therefore, these
+ terms are used interchangeably in this document in the context
+ of 3GP files.
+
+ Media Header Box / Track Header Box / ...
+
+ The 3GP file format makes use of these structures defined in the
+ ISO Base File Format [2]. When referring to these in this
+ document, initials are capitalized for clarity.
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 11]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+4. RTP Payload Format for 3GPP Timed Text
+
+ The format of an RTP packet containing 3GPP timed text is shown
+ below:
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |V=2|P|X| CC |M| PT | sequence number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | timestamp |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | synchronization source (SSRC) identifier |
+ /+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |U| R | TYPE| LEN | :
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ :
+ U| : (variable header fields depending on TYPE :
+ N| : :
+ I< +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ T| | |
+ | : SAMPLE CONTENTS :
+ | | +-+-+-+-+-+-+-+-+
+ | | |
+ \+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 1. 3GPP Timed Text RTP Packet Format
+
+ Marker bit (M): The marker bit SHALL be set to 1 if the RTP packet
+ includes one or more whole text samples or the last fragment of a
+ text sample; otherwise, it is set to zero (0).
+
+ Timestamp: The timestamp MUST indicate the sampling instant of the
+ earliest (or only) unit contained in the RTP packet. The initial
+ value SHOULD be randomly determined, as specified in RTP [3].
+
+ The timestamp value should provide enough timing resolution for
+ expressing the duration of text samples, for synchronizing text
+ with other media, and for performing RTP Control Protocol (RTCP)
+ measurements such as the interarrival delay jitter or the RTCP
+ Packet Receipt Times Report Block (Section 4.3 of RFC 3611
+ [20]). This is compliant to RTP, Section 5.1:
+
+ "The resolution of the clock MUST be sufficient for the
+ desired synchronization accuracy and for measuring packet
+ arrival jitter (one tick per video frame is typically not
+ sufficient)".
+
+
+
+
+
+Rey & Matsui Standards Track [Page 12]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ The above observation applies to both timed text tracks included
+ in a 3GP file and live streaming sessions. In the case of a 3GP
+ timed text track, the timestamp clockrate is the value of the
+ "timescale" parameter in the Media Header Box for that text
+ track. Each track in a 3GP file MAY have its own clockrate as
+ specified in the Media Header Box. Likewise, live streaming
+ applications SHALL use an appropriate timestamp clockrate. A
+ default value of 1000 Hz is RECOMMENDED. Other timestamp
+ clockrates MAY be used. In this case, the typical behavior here
+ is to match the 3GPP timed text clockrate to that used by an
+ associated audio or video stream.
+
+ In an aggregate payload, units MUST be placed in play-out order,
+ i.e., earliest first in the payload. If TYPE 1 units are
+ aggregated, the timestamp of the subsequent units MUST be
+ obtained by adding the timed text sample duration of previous
+ samples to the RTP timestamp value. There are two exceptions to
+ this rule: TYPE 5 units and an aggregate payload containing two
+ fragments of the same text sample. The details of the timestamp
+ calculation are given in Section 4.6.
+
+ Finally, timestamp clockrates MUST be signaled by out-of-band
+ means at session setup, e.g., using the media type "rate"
+ parameter in SDP. See Section 9 for details.
+
+ Payload Type (PT): The payload type is set dynamically and sent by
+ out-of-band means.
+
+ The usage of the remaining RTP header fields (namely, V, P, X, CC, SN
+ and SSRC) follows the rules of RTP and the profile in use.
+
+4.1. Payload Header Definitions
+
+ The (transport) units specified in this document consist of a set of
+ common fields (U, R, TYPE, LEN), followed by specific header fields
+ (TYPES 1-5) and text sample contents. See Figure 1 and Figure 2.
+
+ In Figure 2, two example RTP packets are depicted. The first
+ contains an aggregate RTP payload with two complete text samples, and
+ the second contains one text sample fragment. After each unit header
+ is explained, detailed payload examples follow in Section 4.7.
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 13]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ +----------------------+
+ | |
+ | RTP Header |
+ | |
+ ---------+----------------------+
+ | | |
+ | |COMMON + TYPE 1 Header|
+ | ........................
+ UNIT 1 - | |
+ | | Text Sample |
+ | | |
+ |-------\........................
+ -------/| |
+ | |COMMON + TYPE 1 Header|
+ | ........................
+ UNIT 2 - | |
+ | | Text Sample |
+ | | |
+ | | |
+ ---------+----------------------+
+
+ +----------------------+
+ | |
+ | RTP Header |
+ | |
+ ---------+----------------------+
+ | | COMMON + TYPE 2 |
+ | | (or 3 or 4) Hdr |
+ | ........................
+ UNIT 3 - | |
+ | | Text Sample Fragment |
+ | | |
+ | | |
+ ---------+----------------------+
+
+ Figure 2. Example RTP packets
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 14]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+4.1.1. Common Payload Header Fields
+
+ The fields common to all payload headers have the following format:
+
+ 0 1 2
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |U| R |TYPE | LEN |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 3. Common payload header fields
+
+ Where:
+
+ o U (1 bit) "UTF Transformation flag": This is used to inform RTP
+ receivers whether UTF-8 (U=0) or UTF-16 (U=1) was used to encode
+ the text string. UTF-16 text strings transported by this payload
+ format MUST be serialized in big endian order, a.k.a. network byte
+ order.
+
+ Informative note: Timed text clients complying with the 3GPP
+ Timed Text format [1] are only required to understand the big
+ endian serialization. Thus, in order to ease interoperability,
+ the reverse serialization (little endian) is not supported by
+ this payload format.
+
+ For the payload formats defined in this document, the U bit is only
+ used in TYPE 1 and TYPE 2 headers. Senders MUST set the U bit to
+ zero in TYPE 3, TYPE 4, and TYPE 5 headers. Consequently,
+ receivers MUST ignore the U bit in TYPE 3, TYPE 4, and TYPE 5
+ headers.
+
+ o R (4 bits) "Reserved bits": for future extensions. This field MUST
+ be set to zero (0x0) and MUST be ignored by receivers.
+
+ o TYPE (3 bits) "Type Field": This field specifies which specific
+ header fields follow. The following TYPE values are defined:
+
+ - TYPE 1, for a whole text sample.
+ - TYPE 2, for a text string fragment (without modifiers).
+ - TYPE 3, for a whole modifier box or the first fragment of a
+ modifier box.
+ - TYPE 4, for a modifier fragment other than first.
+ - TYPE 5, for a sample description. Exactly one header per
+ sample description.
+ - TYPE 0, 6, and 7 are reserved for future extensions. Note
+ that future extensions are possible, e.g., a unit that
+ explicitly signals the number of characters present in a
+
+
+
+Rey & Matsui Standards Track [Page 15]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ fragment (see Section 2.5). In order to guarantee backwards-
+ compatibility, it SHALL be possible that older clients ignore
+ (newer) units they do not understand, without invalidating the
+ timestamp calculation mechanisms or otherwise preventing them
+ from decoding the other units.
+
+ o Finally, the LEN (16 bits) "Length Field": indicates the size (in
+ bytes) of this header field and all the fields following, i.e., the
+ LEN field followed by the unit payload: text strings and modifiers
+ (if any). This definition only excludes the initial U/R/TYPE byte
+ of the common header. The LEN field follows network byte order.
+
+ The way in which LEN is obtained when streaming out of a 3GP file
+ depends on the particular unit type. This is explained for each
+ unit in the sections below.
+
+ For live streaming, both sample length and the LEN value for the
+ current fragment MUST be calculated during the sampling process or
+ during fragmentation.
+
+ In general, LEN may take the following values:
+
+ - TYPE = 1, LEN >= 8
+ - TYPE = 2, LEN > 9
+ - TYPE = 3, LEN > 6
+ - TYPE = 4, LEN > 6
+ - TYPE = 5, LEN > 3
+
+ Receivers MUST discard units that do not comply with these values.
+ However, the RTP header fields and the rest of the units in the
+ payload (if any) are still useful, as guaranteed by the requirement
+ for future extensions above.
+
+ In the following subsections the different payload headers for the
+ values of TYPE are specified.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 16]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+4.1.2. TYPE 1 Header
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |U| R |TYPE | LEN (always >=8) | SIDX |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | SDUR | TLEN |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | TLEN |
+ +-+-+-+-+-+-+-+-+
+
+ Figure 4. TYPE 1 Header Format
+
+ This header type is used to transport whole text samples. This unit
+ should be the most common case, i.e., the text sample should usually
+ be small enough to be transported in one unit without having to
+ separate text strings from modifiers. In an aggregate (RTP packet)
+ payload containing several text samples, every sample is preceded by
+ its own TYPE 1 header (see Figure 12).
+
+ Informative note: As indicated in Section 3, "Terminology", a
+ text sample is composed of the text strings followed by the
+ modifiers (if any). This is also how text samples are stored in
+ 3GP files. The separation of a text sample into text strings
+ and modifiers is only needed for large samples (or small
+ available IP MTU sizes; see Section 4.4), and it is accomplished
+ with TYPE 2 and TYPE 3 headers, as explained in the sections
+ below.
+
+ Note also that empty text samples are considered whole text samples,
+ although they do not contain sample contents. Empty text samples may
+ be used to clear the display or to put an end to samples of unknown
+ duration, for example. Units without sample contents SHALL have a
+ LEN field value of 8 (0x0008).
+
+ The fields above have the following meaning:
+
+ o U, R, and TYPE, as defined in Section 4.1.1.
+
+ o LEN, in this case, represents the length of the (complete) text
+ sample plus eight (8) bytes of headers. For finding the length of
+ the text sample in the Sample Size Box of 3GP files, see Section
+ 4.3.
+
+ o SIDX (8 bits) "Text Sample Entry Index": This is an index used to
+ identify the sample descriptions.
+
+
+
+
+Rey & Matsui Standards Track [Page 17]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ The SIDX field is used to find the sample description corresponding
+ to the unit's payload. There are two types of SIDX values: static
+ and dynamic.
+
+ Static SIDX values are used to identify sample descriptions that
+ MUST be sent out-of-band and MUST remain active during the whole
+ session. A static SIDX value is unequivocally linked to one
+ particular sample description during the whole session. Carrying
+ many sample descriptions out-of-band SHOULD be avoided, since these
+ may become large and, ultimately, transport is not the goal of the
+ out-of-band channel. Thus, this feature is RECOMMENDED for
+ transporting those sample descriptions that provide a set of
+ minimum default format settings. Static SIDX values MUST fall in
+ the (closed) interval [129,254].
+
+ Dynamic SIDX values are used for sample descriptions sent in-band.
+ Sample descriptions MAY be sent in-band for several reasons:
+ because they are generated in real time, for transport resiliency,
+ or both. A dynamic SIDX value is unequivocally linked to one
+ particular sample description during the period in which this is
+ active in the session, and it SHALL NOT be modified during that
+ period. This period MAY be smaller than or equal to the session
+ duration. This period is not known a priori. A maximum of 64
+ dynamic simultaneously active SIDX values is allowed at any moment.
+ Dynamic SIDX values MUST fall in the closed interval [0,127]. This
+ should be enough for both recorded content and live streaming
+ applications. Nevertheless, a wraparound mechanism is provided in
+ Section 4.2.1 to handle streaming sessions where more than 64 SIDX
+ values might be needed. Servers MAY make use of dynamic sample
+ descriptions. Clients MUST be able to receive and interpret
+ dynamic sample descriptions.
+
+ Finally, SIDX values 128 and 255 are reserved for future use.
+
+ o SDUR (24 bits) "Text Sample Duration": indicates the sample
+ duration in RTP timestamp units of the text sample. For this
+ field, a length of 3 bytes is preferred to 2 bytes. This is
+ because, for a typical clockrate of 1000 Hz, 16 bits would allow
+ for a maximum duration of just 65 seconds, which might be too short
+ for some streams. On the other hand, 24 bits at 1000 Hz allow for
+ a maximum duration of about 4.6 hours, while for 90 KHz, this value
+ is about 3 minutes. These values should be enough for streaming
+ applications. However, if a larger duration is needed, the
+ extension mechanism specified in Section 4.3 SHALL be used.
+
+ Apart from defining the time period during which the text is
+ displayed, the duration field is also used to find the timestamp of
+ subsequent units within the aggregate RTP packet payload (if any).
+
+
+
+Rey & Matsui Standards Track [Page 18]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ This is explained in Section 4.6.
+
+ Text samples have generally a known duration at the time of
+ transmission. However, in some cases such as live streaming, the
+ time for which a text piece shall be presented might not be known a
+ priori. Thus, the value zero SDUR=0 (0x000000) is reserved to
+ signal unknown duration. The amount of time that a sample of
+ unknown duration is presented is determined by the timestamp of the
+ next sample that shall be displayed at the receiver: Text samples
+ of unknown duration SHALL be displayed until the next text sample
+ becomes active, as indicated by its timestamp.
+
+ The next example illustrates how units of unknown duration MUST be
+ presented. If no text sample following is available, it is an
+ implementation issue what should be displayed. For example, a
+ server could send an empty sample to clear the text box.
+
+ Example: Imagine you are in an airport watching the latest news
+ report while you wait for your plane. Airports are loud, so the
+ news report is transcribed in the lower area of the screen.
+ This area displays two lines of text: the headlines and the
+ words spoken by the news speaker. As usual, the headlines are
+ shown for a longer time than the rest. This time is, in
+ principle, unknown to the stream server, which is streaming
+ live. A headline is just replaced when the next headline is
+ received.
+
+ However, upon storing a text sample with SDUR=0 in a 3GP file, the
+ SDUR value MUST be changed to the effective duration of the text
+ sample, which MUST be always greater than zero (note that the ISO
+ file format [2] explicitly forbids a sample duration of zero). The
+ effective duration MUST be calculated as the timestamp difference
+ between the current sample (with unknown duration) and the next
+ text sample that is displayed.
+
+ Note that samples of unknown duration SHALL NOT use features, which
+ require knowledge of the duration of the sample up front. Such
+ features are scrolling and karaoke in [1]. This also applies for
+ future extensions of the Timed Text format. Furthermore, only
+ sample descriptions (TYPE 5 units) MAY follow units of unknown
+ duration in the same aggregate payload. Otherwise, it would not be
+ possible to calculate the timestamp of these other units.
+
+ For text contents stored in 3GP files, see Section 4.3 for details
+ on how to extract the duration value. For live streaming, live
+ encoders SHALL assign appropriate values and units according to [1]
+ and later releases.
+
+
+
+
+Rey & Matsui Standards Track [Page 19]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ o TLEN (16 bits), "Text String Length", is a byte count of the text
+ string. The decoder needs the text string length in order to know
+ where the modifiers in the payload start. TLEN is not present in
+ text string fragments (TYPE 2) since it can be deductively
+ calculated from the LEN values of each fragment.
+
+ The TLEN value is obtained from the text samples as contained in
+ 3GP files. Refer to Section 4.3. For live content, the TLEN MUST
+ be obtained during the sampling process.
+
+ o Finally, the actual text sample is placed after the TLEN field. As
+ defined in Section 3, a text sample consists of a string of
+ characters encoded using either UTF-8 or UTF-16, followed by zero
+ or more modifiers. Note also that no BOM and no byte count are
+ included in the strings carried in the payload (as opposed to text
+ samples stored in 3GP files [1]).
+
+4.1.3. TYPE 2 Header
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |U| R |TYPE | LEN( always >9) | TOTAL | THIS |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | SDUR | SIDX |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | SLEN |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 5. TYPE 2 Header Format
+
+ This header type is used to transport either a whole text string or a
+ fragment of it. TYPE 2 units SHALL NOT contain modifiers. In
+ detail:
+
+ o U, R, and TYPE, as defined in Section 4.1.1.
+
+ o SIDX and SDUR, as defined in Section 4.1.2.
+
+ Note that the U, SIDX, and SDUR fields are meaningful since
+ partial text strings can also be displayed.
+
+ o The LEN field (16 bits) indicates the length of the text string
+ fragment plus nine (9) bytes of headers. Its value is calculated
+ upon fragmentation. LEN MUST always be greater than nine (0x0009).
+ Otherwise, the unit MUST be discarded.
+
+
+
+
+
+Rey & Matsui Standards Track [Page 20]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ According to the guidelines in Section 4.4, text strings MUST be
+ split at character boundaries for allowing the display of text
+ fragments. Therefore, a text fragment MUST contain at least one
+ character in either UTF-8 or UTF-16. Actually, this is just a
+ formalism since by observing the guidelines, much larger fragments
+ should be created.
+
+ Note also that TYPE 2 units do not contain an explicit text string
+ length, TLEN (see TYPE 1). This is because TYPE 2 units do not
+ contain any modifiers after the text string. If needed, the length
+ of the received string can be obtained using the LEN values of the
+ TYPE 2 units.
+
+ o The SLEN field (16 bits) indicates the size (in bytes) of the
+ original (whole) text sample to which this fragment belongs. This
+ length comprises the text string plus any modifier boxes present
+ (and includes neither the byte order mark nor the text string
+ length as mentioned in Section 3, "Terminology").
+
+ Regarding the text sample length: Timed text samples are not
+ generated at regular intervals, nor is there a default sample size.
+ If 3GP files are streamed, the length of the text samples is
+ calculated beforehand and included in the track itself, while for
+ live encoding it is the real time encoder that SHALL choose an
+ appropriate size for each text sample. In this case, the amount of
+ text 'captured' in a sample depends on the text source and the
+ particular application (see examples below). Samples may, e.g., be
+ tailored to match the packet MTU as closely as possible or to
+ provide a given redundancy for the available bit rate. The
+ encoding application MUST also take into account the delay
+ constraints of the real-time session and assess whether FEC,
+ retransmission, or other similar techniques are reasonable options
+ for stream repair.
+
+ The following examples shall illustrate how a real-time encoder may
+ choose its settings to adapt to the scenario constraints.
+
+ Example: Imagine a newscast scenario, where the spoken news is
+ transcribed and synchronized with the image and voice of the
+ reporter. We assume that the news speaker talks at an average
+ speed of 5 words per second with an average word length of 5
+ characters plus one space per word, i.e., 30 characters per
+ second. We assume an available IP MTU of 576 bytes and an
+ available bitrate of 576*8 bits per second = 4.6 Kbps. We
+ assume each character can be encoded using 2 bytes in UTF-16.
+ In this scenario, several constraints may apply; for example:
+ available IP MTU, available bandwidth, allowable delay, and
+ required redundancy. If the target were to minimize the
+
+
+
+Rey & Matsui Standards Track [Page 21]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ packet overhead, a text sample covering 8 seconds of text
+ would be closest to the IP MTU:
+
+ IP/UDP/RTP/TYPE1 Header + (8-second text sample)
+ = 20 + 8 + 12 + 8 + (~6 chars/word * 5 word/s * 8 s * 2 chars/word)
+ = 528 bytes < 576 bytes
+
+ For other scenarios, like lossy networks, it may happen that just
+ one packet per sample is too low a redundancy. In this case, a
+ choice could be that the encoder 'collects' text every second, thus
+ yielding text samples (TYPE 1 units) of 68 bytes, TYPE 1 header
+ included. We can, e.g., include three contiguous text samples in
+ one RTP payload: the current and last two text samples (see below).
+ This accounts to a total IP packet size of 20 + 8 + 12 + 3*(8 + 60)
+ = 244 bytes. Now, with the same available bitrate of 4.6 Kbps,
+ these 244-byte packets can be sent redundantly up two times per
+ second:
+
+ RTP payload (1,2,3)(1,2,3) (2,3,4)(2,3,4) (3,4,5)(3,4,5) ...
+ Time: <----1s------> <----1s------> <-----1s-----> ...
+
+ This means that each text sample is sent at least six times,
+ which should provide enough redundancy. Although not as
+ bandwidth efficient (488*8 < 528*8 < 576*8 bps) as the
+ previous packetization, this option increases the stream
+ redundancy while still meeting the delay and bandwidth
+ constraints.
+
+ Another example would be a user sending timed text from a
+ type-in area in the display. In this case, the text sample is
+ created as soon as the user clicks the 'send' button.
+ Depending on the packet length, fragmentation may be needed.
+
+ In a video conferencing application, text is synchronized with
+ audio and video. Thus, the text samples shall be displayed
+ long enough to be read by a human, shall fit in the video
+ screen, and shall 'capture' the audio contents rendered during
+ the time the corresponding video and audio is rendered.
+
+ For stored content, see Section 4.3 for details on how to find the
+ SLEN value in a 3GP file. For live content, the SLEN MUST be
+ obtained during the sampling process.
+
+ Finally, note that clients MAY use SLEN to buffer space for the
+ remaining fragments of a text sample.
+
+ o The fields TOTAL (4 bits) and THIS (4 bits) indicate the total
+ number of fragments in which the original text sample (i.e., the
+
+
+
+Rey & Matsui Standards Track [Page 22]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ text string and its modifiers) has been fragmented and which order
+ occupies the current fragment in that sequence, respectively. Note
+ that the sequence number alone cannot replace the functionality of
+ the THIS field, since packets (and fragments) may be repeated,
+ e.g., as in repeated transmission (see Section 5). Thus, an
+ indication for "fragment offset" is needed.
+
+ The usual "byte offset" field is not used here for two reasons: a)
+ it would take one more byte and b) it does not provide any
+ information on the character offset. UTF-8/UTF-16 text strings
+ have, in general, a variable character length ranging from 1 to 6
+ bytes. Therefore, the TOTAL/THIS solution is preferred. It could
+ also be argued that the LEN and SLEN fields be used for this
+ purpose, but while they would provide information about the
+ completeness of the text sample, they do not specify the order of
+ the fragments.
+
+ In all cases (TYPEs 2, 3 and 4), if the value of THIS is greater
+ than TOTAL or if TOTAL equals zero (0x0), the fragment SHALL be
+ discarded.
+
+ o Finally, the sample contents following the SLEN field consist of a
+ fragment of the UTF-8/UTF-16 character string; no modifiers follow.
+
+4.1.4. TYPE 3 Header
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |U| R |TYPE | LEN( always >6) |TOTAL | THIS |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | SDUR |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 6. TYPE 3 Header Format
+
+ This header type is used to transport either the entire modifier
+ contents present in a text sample or just the first fragment of them.
+ This depends on whether the modifier boxes fit in the current RTP
+ payload.
+
+ If a text sample containing modifiers is fragmented, this header MUST
+ be used to transport the first fragment or, if possible, the complete
+ modifiers.
+
+ In detail:
+
+ o The U, R, and TYPE fields are defined as in Section 4.1.1.
+
+
+
+Rey & Matsui Standards Track [Page 23]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ o LEN indicates the length of the modifier contents. Its value is
+ obtained upon fragmentation. Additionally, the LEN field MUST be
+ greater than six (0x0006). Otherwise, the unit MUST be discarded.
+
+ o The TOTAL/THIS field has the same meaning as for TYPE 2.
+
+ For TYPE 3 units containing the last (trailing) modifier fragment,
+ the value of TOTAL MUST be equal to that of THIS (TOTAL=THIS). In
+ addition, TOTAL=THIS MUST be greater than one, because the total
+ number of fragments of a text sample is logically always larger
+ than one.
+
+ Otherwise, if TOTAL is different from THIS in a TYPE 3 unit, this
+ means that the unit contains the first fragment of the modifiers.
+
+ o The SDUR has the same definition for TYPE 1. Since the fragments
+ are always transported in own RTP packets, this field is only
+ needed to know how long this fragment is valid. This may, e.g., be
+ used to determine how long it should be kept in the display buffer.
+
+ Note that the SLEN and SIDX fields are not present in TYPE 3 unit
+ headers. This is because a) these fragments do not contain text
+ strings and b) these types of fragments are applied over text string
+ fragments, which already contain this information.
+
+4.1.5. TYPE 4 Header
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |U| R |TYPE | LEN( always >6) |TOTAL | THIS |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | SDUR |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 7. TYPE 4 Header Format
+
+ This header type is placed before modifier fragments, other than the
+ first one.
+
+ The U, R, and TYPE fields are used as per Section 4.1.1.
+
+ LEN indicates as for TYPE 3 the length of the modifier contents and
+ SHALL also be obtained upon fragmentation. The LEN field MUST be
+ greater than six (0x0006). Otherwise, the unit MUST be discarded.
+
+ TOTAL/THIS is used as in TYPE 2.
+
+
+
+
+Rey & Matsui Standards Track [Page 24]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ The SDUR field is defined as in TYPE 1. The reasoning behind the
+ absence of SLEN and SIDX is the same as in TYPE 3 units.
+
+4.1.6. TYPE 5 Header
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |U| R |TYPE | LEN( always >3) | SIDX |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 8. TYPE 5 Header Format
+
+ This header type is used to transport (dynamic) sample descriptions.
+ Every sample description MUST have its own TYPE 5 header.
+
+ The U, R, and TYPE fields are used as per Section 4.1.1.
+
+ The LEN field indicates the length of the sample description, plus
+ three units accounting for the SIDX and LEN field itself. Thus, this
+ field MUST be greater than three (0x0003). Otherwise, the unit MUST
+ be discarded.
+
+ If the sample is streamed from a 3GP file, the length of the sample
+ description contents (i.e., what comes after SIDX in the unit itself)
+ is obtained from the file (see Section 4.3).
+
+ The SIDX field contains a dynamic SIDX value assigned to the sample
+ description carried as sample content of this unit. As only dynamic
+ sample descriptions are carried using TYPE 5, the possible SIDX
+ values are in the (closed) interval [0,127].
+
+ Senders MAY make use of TYPE 5 units. All receivers MUST implement
+ support for TYPE 5 units, since it adds minimum complexity and may
+ increase the robustness of the streaming session.
+
+ The next section specifies how SIDX values are calculated.
+
+4.2. Buffering of Sample Descriptions
+
+ The buffering of sample descriptions is a matter of the client's
+ timed text codec implementation. In order to work properly, this
+ payload format requires that:
+
+ o Static sample descriptions MUST be buffered at the client, at
+ least, for the duration of the session.
+
+
+
+
+
+Rey & Matsui Standards Track [Page 25]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ o If dynamic sample descriptions are used, their buffering and
+ update of the SIDX values MUST follow the mechanism described in
+ the next section.
+
+4.2.1. Dynamic SIDX Wraparound Mechanism
+
+ The use of dynamic sample descriptions by senders is OPTIONAL.
+ However, if they are used, senders MUST implement this mechanism.
+ Receivers MUST always implement it.
+
+ Dynamic SIDX values remain active either during the entire duration
+ of the session (if used just once) or in different intervals of it
+ (if used once or more).
+
+ Note: In the following, SIDX means dynamic SIDX.
+
+ For choosing the wraparound mechanism, the following rationale was
+ used: There are 128 dynamic SIDX values possible, [0..127]. If one
+ chooses to allow a maximum of 127 to be used as dynamic SIDXs, then
+ any reordered packet with a new sample description would make the
+ mechanism fail. For example, if the last packet received is SIDX=5,
+ then all 127 values except SIDX=6 would be "active". Now, if a
+ reordered packet arrives with a new description, SIDX=9, it will be
+ mistakenly discarded, because the SIDX=9 is, at that moment, marked
+ as "active" and active sample descriptions shall not be re-written.
+ Therefore, a "guard interval" is introduced. This guard interval
+ reduces the number of active SIDXs at any point in time to 64.
+ Although most timed text applications will probably need less than 64
+ sample descriptions during a session (in total), a wraparound
+ mechanism to handle the need for more is described here.
+
+ Thereby, a sliding window of 64 active SIDX values is used. Values
+ within the window are "active"; all others are marked "inactive". An
+ SIDX value becomes active if at least one sample description
+ identified by that SIDX has been received. Since sample descriptions
+ MAY be sent redundantly, it is possible that a client receives a
+ given SIDX several times. However, active sample descriptions SHALL
+ NOT be overwritten: The receiver SHALL ignore redundant sample
+ descriptions and it MUST use the already cached copy. The "guard
+ interval" of (64) inactive values ensures that the correct
+ association SIDX <-> sample description is always used.
+
+ Informative note: As for the "guard interval" value itself, 64
+ as 128/2 was considered simple enough while still meeting the
+ expected maximum number of sample descriptions. Besides that,
+ there's no other motivation for choosing 64 or a different
+ value.
+
+
+
+
+Rey & Matsui Standards Track [Page 26]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ The following algorithm is used to buffer dynamic sample descriptions
+ and to maintain the dynamic SIDX values:
+
+ Let X be the last SIDX received that updated the range of active
+ sample descriptions. Let Y be a value within the allowed range for
+ dynamic SIDX: [0,127], and different from X. Let Z be the SIDX of
+ the last received sample description. Then:
+
+ 1. Initialize all dynamic SIDX values as inactive. For stored
+ contents, read the sample description index in the Sample to
+ Chunk box ("stsc") for that sample. For live streaming, the
+ first value MAY be zero or any other value in the interval
+ above. Go to step 2.
+
+ 2. First, in-band sample description with SIDX=Z is received and
+ stored; set X=Z. Go to step 3.
+
+ 3. Any SIDX within the interval [X+1 modulo(128), X+64 modulo(128)]
+ is marked as inactive, and any corresponding sample description
+ is deleted. Any SIDX within the interval [X+65 modulo(128), X]
+ is set active. Go to step 4 (wait state).
+
+ 4. Wait for next sample description. Once the client is
+ initialized, the interval of active SIDX values MUST change
+ whenever a sample description with an SIDX value in the inactive
+ set is received. That is, upon reception of a sample
+ description with SIDX=Z, do the following:
+
+ a. If Z is in the (closed) interval [X+1 modulo(128), X+64
+ modulo(128)] then set X=Z, store the sample description, and
+ go to step 3.
+
+ b. Else, Z must be in the interval [X+65 modulo(128), X], thus:
+
+ i. If SIDX=Z is not stored, then store the sample
+ description. Go to beginning of step 4 (wait state).
+ ii. Else, go to the beginning of step 4 (wait state).
+
+ Informative note: It is allowed that any value of SIDX=X be sent
+ in the interval [0,127]. For example, if [64..127] is the
+ current active set and SIDX=0 is sent, a new sample description
+ is defined (0) and an old one deleted (64); thus [65..127] and
+ [0] are active. Similarly, one could now send SIDX=64, thus
+ inverting the active and inactive sets.
+
+ Example:
+ If X=4, any SIDX in the interval [5,68] is inactive. Active
+ SIDX values are in the complementary interval [69,127] plus
+
+
+
+Rey & Matsui Standards Track [Page 27]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ [0,4]. For example, if the client receives a SIDX=6, then the
+ active interval is now different: [0,6] plus [71,127]. If the
+ received SIDX is in the current active interval, no change SHALL
+ be applied.
+
+4.3. Finding Payload Header Values in 3GP Files
+
+ For the purpose of streaming timed text contents, some values in the
+ boxes contained in a 3GP file are mapped to fields of this payload
+ header. This section explains where to find those values.
+
+ Additionally, for the duration and sample description indexes,
+ extension mechanisms are provided. All senders MUST implement the
+ extension mechanisms described herein.
+
+ If the file is streamed out of a 3GP file, the following guidelines
+ SHALL be followed.
+
+ Note: All fields in the objects (boxes) of a 3GP file are found
+ in network byte order.
+
+ Information obtained from the Sample Table Box (stbl):
+
+ o Sample Descriptions and Sample Description length: The Sample
+ Description box (stsd, inside the stbl) contains the sample
+ descriptions. For timed text media, each element of stsd is a
+ timed text sample entry (type "tx3g").
+
+ The (unsigned) 32 bits of the "size" field in the stsd box
+ represent the length (in bytes) of the sample description, as
+ carried in TYPE 5 units. On the other hand, the LEN field of
+ TYPE 5 units is restricted to 16 bits. Therefore, if the
+ value of "size" is greater than (2^16-1-3)[bytes], then the
+ sample description SHALL NOT be streamed with this payload
+ format. There is no extension mechanism defined in this case,
+ since fragmentation of sample descriptions is not defined
+ (sample descriptions are typically up to some 200 bytes in
+ size). Note: The three (3) accounts for the TYPE 5 header
+ fields included in the LEN value.
+
+ o SDUR from the Decoding Time to Sample Box (stts). The
+ (unsigned) 32 bits of the "sample delta" field are used for
+ calculating SDUR. However, since the SDUR field is only 3
+ bytes long, text samples with duration values larger than
+ (2^24-1)/(timestamp clockrate)[seconds] cannot be streamed
+ directly. The solution is simple: Copies of the corresponding
+ text sample SHALL be sent. Thereby, the timestamp and
+ duration values SHALL be adjusted so that a continuous display
+
+
+
+Rey & Matsui Standards Track [Page 28]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ is guaranteed as if just one sample would have been sent.
+ That is, a sample with timestamp TS and duration SDUR can be
+ sent as two samples having timestamps TS1 and TS2 and
+ durations SDUR1 and SDUR2, such that TS1=TS, TS2=TS1+SDUR1,
+ and SDUR=SDUR1+SDUR2.
+
+ o Text sample length from the Sample Size Box (stsz). The
+ (unsigned) 32 bits of the "sample size" or "entry size" (one
+ of them, depending on whether the sample size is fixed or
+ variable) indicate the length (in bytes) of the 3GP text
+ sample. For obtaining the length of the (actual) streamed
+ text sample, the lengths of the text string byte count (2
+ bytes) and, in case of UTF-16 strings, the length the BOM
+ (also 2 bytes) SHALL be deducted. This is illustrated in
+ Figure 9.
+
+ Text Sample according to 3GPP TS 26.245
+
+ TEXT SAMPLE (length=stsz)
+ .--------------------------------------------------.
+ / \
+ TEXT STRING (length=TBC)
+ .------------------------------------.
+ / \
+ TBC BOM MODIFIERS
+ +---+---+----------------------------------+-----------+
+ ||
+ || TBC BOM -> TLEN field
+ || +---+---+ U bit
+ ||
+ \/
+
+ Text Sample according to this Payload Format
+
+ TEXT SAMPLE (length=SLEN w/o TBC,BOM)
+ .--------------------------------------------.
+ / \
+ TEXT STRING (length=TLEN)
+ .--------------------------------.
+ / \
+ TEXT STRING MODIFIERS
+ +----------------------------------+-----------+
+
+ KEY:
+ TBC = Text string Byte Count
+ BOM = Byte Order Mark
+
+ Figure 9. Text sample composition
+
+
+
+Rey & Matsui Standards Track [Page 29]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ Moreover, since the LEN field in TYPE 1 unit header is 16 bits
+ long, larger text sample sizes than (2^16-1-8) [bytes] SHALL
+ NOT be streamed. Also, in this case, no extension mechanism
+ is defined. This is because this maximum is considered enough
+ for the targeted streaming applications. (Note: The eight (8)
+ accounts for the TYPE 1 header fields included in the LEN
+ value).
+
+ o SIDX from the Sample to Chunk Box (stsc): The stsc Box is used
+ to find samples and their corresponding sample descriptions.
+ These are referenced by the "sample description index", a
+ 32-bit (unsigned) integer. If possible, these indices may be
+ directly mapped to the SIDX field. However, there are several
+ cases where this may not be possible:
+
+ a) The total number of indices used is greater than
+ the number of indices available, i.e., if the static
+ sample descriptions are more than 127 or the dynamic ones
+ are more than 64.
+
+ b) The original SIDX value ranges do not fit in the
+ allowed ranges for static (129-254) or dynamic (0-127)
+ values.
+
+ Therefore, when assigning SIDX values to the sample
+ descriptions, the following guidelines are provided:
+
+ o Static sample descriptions can simply be assigned
+ consecutive values within the range 129-254 (closed
+ interval). This range should be well enough for static
+ sample descriptions.
+
+ o As for dynamic sample descriptions:
+
+ a) Streams that use less than 64 dynamic sample
+ descriptions SHOULD use consecutive values for SIDX
+ anywhere in the range 0-127 (closed interval).
+
+ b) For streams with more than 64 sample descriptions,
+ the SIDX values MUST be assigned in usage order, and if
+ any sample description shall be used after it has been
+ set inactive, it will need to be re-sent and assigned a
+ new SIDX value (according to the algorithm in Section
+ 4.2.1).
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 30]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ Information obtained from the Media Data Box:
+
+ o Text strings, TLEN, U bit, and modifiers from the Media Data
+ Box (mdat). Text strings, 16-bit text string byte count, Byte
+ Order Mark (BOM, indicating UTF encoding), and modifier boxes
+ can be found here.
+
+ For TYPE 1 units, the value of TLEN is extracted from the text
+ string byte count that precedes the text string in the text
+ sample, as stored in the 3GP file. If UTF-16 encoding is
+ used, two (2) more bytes have to be deducted from this byte
+ count beforehand, in order to exclude the BOM. See Figure 9.
+
+4.4. Fragmentation of Timed Text Samples
+
+ This section explains why text samples may have to be fragmented and
+ discusses some of the possible approaches to doing it. A solution is
+ proposed together with rules and recommendations for fragmenting and
+ transporting text samples.
+
+ 3GPP Timed Text applications are expected to operate at low bitrates.
+ This fact, added to the small size of timed text samples (typically
+ one or two hundred bytes) makes fragmentation of text samples a rare
+ event. Samples should usually fit into the MTU size of the used
+ network path.
+
+ Nevertheless, some text strings (e.g., ending roll in a movie) and
+ some modifier boxes (i.e., for hyperlinks, for karaoke, or for
+ styles) may become large. This may also apply for future modifier
+ boxes. In such cases, the first option to consider is whether it is
+ possible to adjust the encoding (e.g., the size of sample) in such a
+ way that fragmentation is avoided. If it is, this is preferred to
+ fragmentation and SHOULD be done.
+
+ Otherwise, if this is not possible or other constraints prevent it,
+ fragmentation MAY be used, and the basic guidelines given in this
+ document MUST be followed:
+
+ o It is RECOMMENDED that text samples be fragmented as seldom as
+ possible, i.e., the least possible number of fragments is created
+ out of a text sample.
+
+ o If there is some bitrate and free space in the payload available,
+ sample descriptions (if at hand) SHOULD be aggregated.
+
+ o Text strings MUST split at character boundaries; see TYPE 2 header.
+ Otherwise, it is not possible to display the text contents of a
+ fragment if a previous fragment was lost. As a consequence, text
+
+
+
+Rey & Matsui Standards Track [Page 31]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ string fragmentation requires knowledge of the UTF-8/UTF-16
+ encoding formats to determine character boundaries.
+
+ o Unlike text strings, the modifier boxes are NOT REQUIRED to be
+ split at meaningful boundaries. However, it is RECOMMENDED that
+ this be done whenever possible. This decreases the effects of
+ packet loss. This payload format does not ensure that partially
+ received modifiers are applied to text strings. If only part of
+ the modifiers is received, it is an application issue how to deal
+ with these, i.e., whether or not to use them.
+
+ Informative note: Ensuring that partially received modifiers can
+ be applied to text strings in all cases (for all modifier types
+ and for all fragment loss constellations) would place additional
+ requirements on the payload format. In particular, this would
+ require that: a) senders understand the semantics of the
+ modifier boxes and b) specific fragment headers for each of the
+ modifier boxes are defined, in addition to the payload formats
+ defined below. Understanding the modifiers semantics means
+ knowing, e.g., where each modifier starts and ends, which text
+ fragments are affected, which modifiers may or may not be split,
+ or what the fields indicate. This is necessary to be able to
+ split the modifiers in such a way that each fragment can be
+ applied independently of previous packet losses. This would
+ require a more intelligent fragmentation entity and more complex
+ headers. Given the low probability of fragmentation and the
+ desire to keep the requirements low, it does not seem reasonable
+ to specify such modifier box specific headers.
+
+ o Modifier and text string fragments SHOULD be protected against
+ packet losses, i.e., using FEC [7], retransmission [11], repetition
+ (Section 5), or an equivalent technique. This minimizes the
+ effects of packet loss.
+
+ o An additional requirement when fragmenting text samples is that the
+ start of the modifiers MUST be indicated using the payload header
+ defined for that purpose, i.e., a TYPE 3 unit MUST be used (see
+ Section 4.1.4). This enables a receiver to detect the start of the
+ modifiers as long as there are not two or more consecutive packet
+ losses.
+
+ o Finally, sample descriptions SHALL NOT be fragmented because they
+ contain important information that may affect several text samples.
+
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 32]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+4.5. Reassembling Text Samples at the Receiver
+
+ The payload headers defined in this document allow reassembling
+ fragmented text samples. For this purpose, the standard RTP
+ timestamp, the duration field (SDUR), and the fields TOTAL/THIS in
+ the payload headers are used.
+
+ Units that belong to the same text sample MUST have the same
+ timestamp. TYPE 5 units do not comply with this rule since they are
+ not part of any particular text sample.
+
+ The process for collecting the different fragments (units) of a text
+ sample is as follows:
+
+ 1. Search for units having the same timestamp value, i.e., units
+ that belong to the same text sample or sample descriptions that
+ shall become available at that time instant. If several units
+ of the same sample are repeated, only one of them SHALL be used.
+ Repeated units are those that have the same timestamp and the
+ same values for TOTAL/THIS.
+
+ Note that, as mentioned in Section 4.1.1, the receiver
+ SHALL ignore units with unrecognized TYPE value.
+ However, the RTP header fields and the rest of the units
+ (if any) in the payload are still useful.
+
+ 2. Check within this set whether any of the units from the text
+ sample is missing. This is done using the TOTAL and THIS
+ fields; the TOTAL field indicates how many fragments were
+ created out of the text sample, and the THIS field indicates the
+ position of this fragment in the text sample. As result of this
+ operation, two outcomes are possible:
+
+ a. No fragment is missing. Then, the THIS field SHALL be used
+ to order the fragments and reassemble the text sample
+ before forwarding it to the decoding application. Special
+ care SHALL be taken when reassembling the text string as
+ indicated in bullet 4 below.
+
+ b. One or more fragments are missing: Check whether this
+ fragment belongs to the text string or to the modifiers.
+ TYPE 2 units identify text string fragments, and TYPE 3 and
+ 4 identify modifier fragments:
+
+ i. If the fragment or fragments missing belong to the text
+ string and the modifiers were received complete, then
+ the received text characters may, at least, be
+ displayed as plain text. Some modifiers may only be
+
+
+
+Rey & Matsui Standards Track [Page 33]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ applied as long as it is possible to identify the
+ character numbers, e.g., if only the last text string
+ fragment is lost. This is the case for modifiers
+ defining specific font styles ('styl'), highlighted
+ characters ('hlit'), karaoke feature ('krok'), and
+ blinking characters ('blnk'). Other modifiers such as
+ 'dlay' or 'tbox' can be applied without the knowledge
+ of the character number. It is an application issue to
+ decide whether or not to apply the modifiers.
+
+ ii. If the fragment missing belongs to the modifiers and
+ the text strings were received complete, then the
+ incomplete modifiers may be used. The text string
+ SHOULD at least be displayed as plain text. As
+ mentioned in Section 4.4, modifiers may split without
+ observing meaningful boundaries. Hence, it may not
+ always be possible to make use of partially received
+ modifiers. However, to avoid this, it is RECOMMENDED
+ that the modifiers do split at meaningful boundaries.
+
+ iii. A third possibility is that it is not possible to
+ discern whether modifiers or text strings were received
+ complete. For example, if the TYPE 3 unit of a sample
+ plus the following or preceding packet is lost, there
+ is no way for the RTP receiver to know if one or both
+ packets lost belong to the modifiers or if there are
+ also some missing text strings. Repetition, FEC,
+ retransmission, or other protection mechanisms as per
+ section 4.6 are RECOMMENDED to avoid this situation.
+
+ iv. Finally, if it is sure that neither text strings nor
+ modifiers were received complete, then the text strings
+ and the modifiers may be rendered partially or may be
+ discarded. This is an application choice.
+
+ 3. Sample descriptions can be directly associated with the
+ reassembled text samples, via the sample description index
+ (SIDX).
+
+ 4. Reassembling of text strings: Since the text strings transported
+ in RTP packets MUST NOT include any byte order mark (BOM), the
+ receiver MUST prepend it to the reassembled UTF-16 string before
+ handling it to the timed text decoder (see Figure 9). The value
+ of the BOM is 0xFEFF because only big endian serialization of
+ UTF-16 strings is supported by this payload format.
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 34]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+4.6. On Aggregate Payloads
+
+ Units SHOULD be aggregated to avoid overhead, whenever possible. The
+ aggregate payloads MUST comply with one of the following ordered
+ configurations:
+
+ 1. Zero or more sample descriptions (TYPE 5) followed by zero or more
+ whole text samples (TYPE 1 units). At least one unit of either
+ type MUST be present.
+
+ 2. Zero or more sample descriptions followed by zero or one modifier
+ fragment, either TYPE 3 or TYPE 4. At least one unit MUST be
+ present.
+
+ 3. Zero or more sample descriptions, followed by zero or one text
+ string fragment (TYPE 2), followed by zero or one TYPE 3 unit. If
+ a TYPE 2 unit and a TYPE 3 unit are present, then they MUST belong
+ to the same text sample. At least one unit MUST be present.
+
+ Some observations:
+
+ o Different aggregates than the ones listed above SHALL NOT be used.
+
+ o Sample descriptions MUST be placed in the aggregate payload before
+ the occurrence of any non-TYPE 5 units.
+
+ o Correct reception of TYPE 5 units is important since their contents
+ may be referenced by several other units in the stream.
+
+ Receivers are unable to use text samples until their corresponding
+ sample descriptions are received. Accordingly, a sender SHOULD
+ send multiple copies of a sample description to ensure reliability
+ (see Section 5). Receivers MAY use payload-specific feedback
+ messages [21] to tell a sender that they have received a particular
+ sample description.
+
+ o Regarding timestamp calculation: In general, the rules for
+ calculating the timestamp of units in an aggregate payload depend
+ on the type of unit. Based on the possible constellations for
+ aggregate payloads, as above, we have:
+
+ o Sample descriptions MUST receive the RTP timestamp of the
+ packet in which they are included.
+
+ Note that for TYPE 5 units, the timestamp actually does not
+ represent the instant when they are played out, but instead
+ the instant at which they become available for use.
+
+
+
+
+Rey & Matsui Standards Track [Page 35]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ o For the first configuration: The first TYPE 1 unit receives
+ the RTP timestamp. The timestamp of any subsequent TYPE 1
+ unit MUST be obtained by adding sample duration and
+ timestamp, both of the preceding TYPE 1 unit.
+
+ o For the second and third configuration, all units, TYPE 2,
+ 3, and 4, MUST receive the RTP timestamp.
+
+ Refer to detailed examples on the timestamp calculation
+ below.
+
+ o As per configuration 3 above, a payload MAY contain several
+ fragments of one (and only one) text sample. If it does, then
+ exactly one TYPE 2 unit followed by exactly one TYPE 3 unit is
+ allowed in the same payload. This is in line with RFC 3640 [12],
+ Section 2.4, which explicitly disallows combining fragments of
+ different samples in the same RTP payload. Note that, in this
+ special case, no timestamp calculation is needed. That is, the RTP
+ timestamp of both units is equal to the timestamp in the packet's
+ RTP header.
+
+ o Finally, note that the use of empty text samples allows for
+ aggregating non-consecutive TYPE 1 units in the same payload. Two
+ text samples, with timestamps TS1 and TS3 and durations SDUR1 and
+ SDUR3, are not consecutive if it holds TS1+SDUR1 < TS3. A solution
+ for this is to include an empty TYPE 1 unit with duration SDUR2
+ between them, such that TS2+SDUR2 = TS1+SDUR1+SDUR2 = TS3.
+
+ Some examples of aggregate payloads are illustrated in Figure 10.
+ (Note: The figure is not scaled.)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 36]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ N/A TS1 TS2 TS3
+ +------+-----+------+-----+
+ |TYPE5 |TYPE1|TYPE1 |TYPE1|
+ +------+-----+------+-----+
+ N/A sdur1 sdur2 sdur3
+
+ N/A TS4
+ +-----+-------+
+ |TYPE5| TYPE 1| a)
+ +-----+-------+
+ N/A sdur4
+
+ TS4 TS4 TS4
+ +--------------+ +--------------+
+ | TYPE2 | |TYPE2 |TYPE 3 | b)
+ +--------------+ +--------------+
+ sdur4 sdur4 sdur4
+
+ TS4 TS4
+ +--------------+ +--------------+
+ | TYPE2| TYPE 3| | TYPE4 | c)
+ +--------------+ +--------------+
+ sdur4 sdur4 sdur4
+
+ |----------PAYLOAD 1------| |--PAYLOAD 2---| |--PAYLOAD 3---|
+ rtpts1 rtpts2 rtpts3
+
+ KEY:
+ TSx = Text Sample x
+ rtptsy = the standard RTP timestamp for PAYLOAD y
+ sdurx = the duration of Text Sample x
+ N/A = not applicable
+
+ Figure 10. Example aggregate payloads
+
+ In Figure 10, four text samples (TS1 through TS4) are sent using
+ three RTP packets. These configurations have been chosen to show how
+ the 5 TYPE headers are used. Additionally, three different
+ possibilities for the last text sample, TS4, are depicted: a), b),
+ and c).
+
+ In Figure 11, option b) from Figure 10 is chosen to illustrate how
+ the timestamp for each unit is found.
+
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 37]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ N/A TS1 TS2 TS3 TS4 TS4 TS4
+ +------+-----+------+-----+ +--------------+ +--------------+
+ |TYPE5 |TYPE1|TYPE1 |TYPE1| | TYPE2 | |TYPE2 |TYPE 3 |
+ +------+-----+------+-----+ +--------------+ +--------------+
+ N/A sdur1 sdur2 sdur3 sdur4 sdur4 sdur4
+
+ (#1) (#2) (#3) (#4) (#5) (#6) (#7)
+
+ |----------PAYLOAD 1------| |--PAYLOAD 2---| |--PAYLOAD 3---|
+ rtpts1 rtpts2 rtpts3
+
+ Figure 11. Selected payloads from Figure 10
+
+ Assuming TSx means Text Sample x, rtptsy represents the standard RTP
+ timestamp for PAYLOAD y and sdurx, the duration of Text Sample x, the
+ timestamp for unit #z, ts(#z), can be found as the sum of rtptsy and
+ the cumulative sum of the durations of preceding units in that
+ payload (except in the case of PAYLOAD 3 as per rule 3 above). Thus,
+ we have:
+
+ 1. for the units in the first aggregate payload, PAYLOAD 1:
+
+ ts(#1) = rtpts1
+ ts(#2) = rtpts1
+ ts(#3) = rtpts1 + sdur1
+ ts(#4) = rtpts1 + sdur1 + sdur2
+
+ Note that the TYPE 5 and the first TYPE 1 unit have both the
+ RTP timestamp.
+
+ 2. for PAYLOAD 2:
+
+ ts(#5) = rtpts2
+
+ 3. for PAYLOAD 3:
+
+ ts(#6) = ts(#7) = rtpsts2 = rtpts3
+
+ According to configuration 3 above, the TYPE2 and the TYPE 3
+ units shall belong to the same sample. Hence, rtpts3 must be
+ equal to rtpts2. For the same reason, the value of SDUR is
+ not be used to calculate the timestamp of the next unit.
+
+
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 38]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+4.7. Payload Examples
+
+ Some examples of payloads using the defined headers are shown below:
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |V=2|P|X| CC |M| PT | sequence number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | timestamp |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | synchronization source (SSRC) identifier |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |U| R |TYPE1| LEN (always >=8) | SIDX |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | SDUR | TLEN |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | TLEN | |
+ +---------------+ |
+ | text string (no.bytes=TLEN) |
+ | |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | modifiers (no.bytes=LEN - 8 - TLEN) |
+ | |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |U| R |TYPE1| LEN (always >=8) | SIDX |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | SDUR | TLEN |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | TLEN | |
+ +---------------+ |
+ | text string (no.bytes=TLEN) |
+ | |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | modifiers (no.bytes=LEN - 8 - TLEN) |
+ | +-+-+-+-+-+-+-+-+
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 12. A payload carrying two TYPE 1 units
+
+ In Figure 12, an RTP packet carrying two TYPE 1 units is depicted.
+ It can be seen how the length fields LEN and TLEN can be used to find
+ the start of the next unit (LEN), the start of the modifiers (TLEN),
+ and the length of the modifiers (LEN-TLEN).
+
+
+
+Rey & Matsui Standards Track [Page 39]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |V=2|P|X| CC |M| PT | sequence number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | timestamp |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | synchronization source (SSRC) identifier |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |U| R |TYPE5| LEN( always >3) | SIDX |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | sample description (no.bytes=LEN - 3) |
+ | |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |U| R |TYPE1| LEN (always >=8) | SIDX |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | SDUR | TLEN |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | TLEN | |
+ +-+-+-+-+-+-+-+-+ |
+ | text string fragment (no.bytes=TLEN) |
+ | |
+ | |
+ | +-+-+-+-+-+-+-+-+
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 13. An RTP packet carrying a TYPE 5 and a TYPE 1 unit
+
+ In Figure 13, a sample description and a TYPE 1 unit are aggregated.
+ The TYPE 1 unit happens to contain only text strings and is small, so
+ an additional TYPE 5 unit is included to take advantage of the
+ available bits in the packet.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 40]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |V=2|P|X| CC |M| PT | sequence number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | timestamp |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | synchronization source (SSRC) identifier |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |U| R |TYPE2| LEN( always >9) |TOTAL=4|THIS=1 |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | SDUR | SIDX |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | SLEN | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
+ | text string fragment (no.bytes=LEN - 9) |
+ | |
+ : :
+ : :
+ | +-+-+-+-+-+-+-+-+
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 14. Payload with first text string fragment of a sample
+
+ In Figures 14, 15, and 16, a text sample is split into three RTP
+ packets. In Figure 14, the text string is big and takes the whole
+ packet length. In Figure 15, the only possibility for carrying two
+ fragments of the same text sample is represented (see configuration 3
+ in Section 4.6). The last packet, shown in Figure 16, carries the
+ last modifier fragment, a TYPE 4.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 41]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |V=2|P|X| CC |M| PT | sequence number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | timestamp |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | synchronization source (SSRC) identifier |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |U| R |TYPE2| LEN( always >9) |TOTAL=4|THIS=2 |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | SDUR | SIDX |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | SLEN | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
+ | text string fragment (no.bytes=LEN - 9) |
+ | |
+ | |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |U| R |TYPE3| LEN( always >6) |TOTAL=4|THIS=3 |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | SDUR | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
+ | |
+ | modifiers (no.bytes=LEN - 6) |
+ | +-+-+-+-+-+-+-+-+
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 15. An RTP packet carrying a TYPE 2 unit and a TYPE 3 unit
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 42]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |V=2|P|X| CC |M| PT | sequence number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | timestamp |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | synchronization source (SSRC) identifier |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |U| R |TYPE4| LEN( always >6) |TOTAL=4|THIS=4 |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | SDUR | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
+ | |
+ | modifiers (no.bytes=LEN - 6) |
+ | +-+-+-+-+-+-+-+-+
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 16. An RTP packet carrying last modifiers fragment (TYPE 4)
+
+4.8. Relation to RFC 3640
+
+ RFC 3640 [12] defines a payload format for the transport of any non-
+ multiplexed MPEG-4 elementary stream. One of the various MPEG-4
+ elementary stream types is MPEG-4 timed text streams, specified in
+ MPEG-4 part 17 [26], also known as ISO/IEC 14496-17. MPEG-4 timed
+ text streams are capable of carrying 3GPP timed text data, as
+ specified in 3GPP TS 26.245 [1].
+
+ MPEG-4 timed text streams are intentionally constructed so as to
+ guarantee interoperability between RFC 3640 and this payload format.
+ This means that the construction of the RTP packets carrying timed
+ text is the same. That is, the MPEG-4 timed text elementary stream
+ as per ISO/IEC 14496-17 is identical to the (aggregate) payloads
+ constructed using this payload format.
+
+ Figure 17 illustrates the process of constructing an RTP packet
+ containing timed text. As can be seen in the partition block, the
+ (transport) units used in this payload format are identical to the
+ Timed Text Units (TTUs) defined in ISO/IEC 14496-17. Likewise, the
+ rules for payload aggregation as per Section 4.6 are identical to
+ those defined in ISO/IEC 14496-17 and are compliant with RFC 3640.
+ As a result, an RTP packet that uses this payload format is identical
+ to an RTP packet using RFC 3640 conveying TTUs according to ISO/IEC
+ 14496-17. In particular, MPEG-4 Part 17 specifies that when using
+
+
+
+
+
+Rey & Matsui Standards Track [Page 43]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ RFC 3640 for transporting timed text streams, the "streamType"
+ parameter value is set to 0x0D, and the value of the
+ "objectTypeIndication" in "config" takes the value 0x08.
+
+ +--------------------------------------+
+ Text samples | +--------------+ +--------------+ |
+ as per 3GPP | |Text Sample 1 | |Text Sample N | |
+ TS 26245 | +--------------+ +--------------+ |
+ +--------------------------------------+
+ \/
+ +-------------------------------------------------------------------+
+ | Partition Text Samples into units. TTU[i]= TYPE i units. |
+ | |
+ |[U R TYPE LEN][{TOTAL,THIS}SIDX{SDUR}{TLEN}{SLEN}][SampleContents] |
+ |{..} means present if applicable, [..] means always present |
+ +-------------------------------------------------------------------+
+ \/ \/
+ +-------------------------------------------------------------------+
+ | Aggregation (if possible) |
+ +-------------------------------------------------------------------+
+ \/ \/
+ +-------------------------------------------------------------------+
+ | RTP Entity adds and fills RTP header and Sends RTP packet, where |
+ | RTP packets according to this Payload Format = |
+ | RTP packets carrying MPEG-4 Timed Text ES over RFC 3640 |
+ +-------------------------------------------------------------------+
+
+ Figure 17. Relation to RFC 3640
+
+ Note: The use of RFC 3640 for transport of ISO/IEC 14496-17 data does
+ not require any new SDP parameters or any new mode definition.
+
+4.9. Relation to RFC 2793
+
+ RFC 2793 [22] and its revision, RFC 4103 [23], specify a protocol for
+ enabling text conversation. Typical applications of this payload
+ format are text communication terminals and text conferencing tools.
+ Text session contents are specified in ITU-T Recommendation T.140
+ [24]. T.140 text is UTF-8 coded as specified in T.140 [24] with no
+ extra framing. The T140block contains one or more T.140 code
+ elements as specified in T.140. Code elements are control sequences
+ such as "New Line", "Interrupt", "String Terminator", or "Start of
+ String". Most T.140 code elements are single ISO 10646 [25]
+ characters, but some are multiple character sequences. Each
+ character is UTF-8 encoded [18] into one or more octets.
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 44]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ This payload format may also be used for conversational applications
+ (even for instant messaging). However, this is not its main target.
+ The differentiating feature of 3GPP Timed Text media format is that
+ it allows text decoration. This is especially useful in multimedia
+ presentations, karaoke, commercial banners, news tickers, clickable
+ text strings, and captions. T.140 text contents used in RFC 2793 do
+ not allow the use of text decoration.
+
+ Furthermore, the conversational text RTP payload format recommends a
+ method to include redundant text from already transmitted packets in
+ order to reduce the risk of text loss caused by packet loss. Thereby
+ payloads would include a redundant copy of the last payload sent.
+ This payload format does not describe such a method, but this is also
+ applicable here. As explained in Section 5, packet redundancy SHOULD
+ be used, whenever possible. The aggregation guidelines in Section
+ 4.6 allow redundant payloads.
+
+5. Resilient Transport
+
+ Apart from the basic fragmentation guidelines described in the
+ section above, the simplest option for packet-loss-resilient
+ transport is packet repetition. This mechanism may consist of a
+ strict window-based repetition mechanism or, simply, a repetition
+ mechanism in a wider sense, where new and old packets are mixed, for
+ example.
+
+ A server MAY decide to use repetition as a measure for packet loss
+ resilience. Thereby, a server MAY send the same RTP payloads or just
+ some of the units from the payloads.
+
+ As for the case of complete payloads, single repeated units MUST
+ exactly match the same units sent in the first transmission; i.e., if
+ fragmentation is needed, it SHALL be performed only once for each
+ text sample. Only then, a receiver can use the already received and
+ the repeated units to reconstruct the original text samples. Since
+ the RTP timestamp is used to group together the fragments of a
+ sample, care must taken to preserve the timing of units when
+ constructing new RTP packets.
+
+ For example, if a text sample was originally sent as a single
+ non-fragmented text sample (one TYPE 1 unit), a repetition of
+ that sample MUST be sent also as a single non-fragmented text
+ sample in one unit. Likewise, if the original text sample was
+ fragmented and spread over several RTP packets (say, a total of
+ 3 units), then the repeated fragments SHALL also have the same
+ byte boundaries and use the same unit headers and bytes per
+ fragment.
+
+
+
+
+Rey & Matsui Standards Track [Page 45]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ With repetition, repeated units resolve to the same timestamp as
+ their originals. Where redundant units are available, only one of
+ them SHALL be used.
+
+ Regarding the RTP header fields:
+
+ o If the whole RTP payload is repeated, all payload-specific fields
+ in the RTP header (the M, TS and PT fields) MUST keep their
+ original values except the sequence number, which MUST be
+ incremented to comply with RTP (the fields TOTAL/THIS enable to
+ re-assemble fragments with different sequence numbers).
+
+ o In packets containing single repeated units, the general rules in
+ Section 3 for assigning values to the RTP header fields apply.
+ Keeping the value of the RTP timestamp to preserve the timing of
+ the units is particularly relevant here.
+
+ Apart from repetition, other mechanisms such as FEC [7],
+ retransmission [11], or similar techniques could be used to cope with
+ packet losses.
+
+6. Congestion Control
+
+ Congestion control for RTP SHALL be implemented in accordance with
+ RTP [3] and the applicable RTP profile, e.g., RTP/AVP [17].
+
+ When using this payload format, mainly two factors may affect the
+ congestion control:
+
+ o The use of (unit) aggregation may make the payload format more
+ bandwidth efficient, by avoiding header overhead and thus reducing
+ the used bitrate.
+
+ o The use of resilient transport mechanisms: Although timed text
+ applications typically operate at low bitrates, the increase due to
+ resilient transport shall be considered for congestion control
+ mechanisms. This applies to all mechanisms but especially to less
+ efficient ones like repetition.
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 46]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+7. Scene Description
+
+7.1. Text Rendering Position and Composition
+
+ In order to set up a timed text session, regardless of the stream
+ being stored in a 3GP file or streamed live, some initial layout
+ information is needed by the communicating peers.
+
+ +-------------------------------------------+
+ | <-> tx | +-------------+
+ | +-------------------------------+ |<---|Display Area |
+ | ^ | | | +-------------+
+ | : | | |
+ | :ty| | | +-------------+
+ | : | |<---------|Video track |
+ | : | | | +-------------+
+ | : | | |
+ | : | | |
+ | : | | |
+ | v | | |
+ | - | x-------------------------+ | | +-------------+
+ |h ^ | | |<-----------|Text Track |
+ |e : +---|-------------------------|-+ | +-------------+
+ |i : | +---------------------+ | |
+ |g : | | | | | +-------------+
+ |h : | | |<------------ |Text Box |
+ |t v | +---------------------+ | | +-------------+
+ | - +-------------------------+ |
+ +-------------------------------------------+
+ <........................>
+ w i d t h
+
+ Figure 18. Illustration of text rendering position and composition
+
+ The parameters used for negotiating the position and size of the text
+ track in the display area are shown in Figure 18. These are the
+ "width" and "height" of the text track, its translation values, "tx"
+ and "ty", and its "layer" or proximity to the user.
+
+ At the same time, the sender of the stream needs to know the
+ receiver's capabilities. In this case, the maximum allowable values
+ for the text track height and width: "max-h" and "max-w", for the
+ stream the receiver shall display.
+
+ This layout information MUST be conveyed in a reliable form before
+ the start of the session, e.g., during session announcement or in an
+ Offer/Answer (O/A) exchange. An example of a reliable transport may
+ be the out-of-band channel used for SDP. Sections 8 and 9 provide
+
+
+
+Rey & Matsui Standards Track [Page 47]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ details on the mapping of these parameters to SDP descriptions and
+ their usage in O/A.
+
+ For stored content, the layout values expressing stream properties
+ MUST be obtained from the Track Header Box. See Section 7.3.
+
+ For live streaming, appropriate values as negotiated during session
+ setup shall be used.
+
+7.2. SMIL Usage
+
+ The attributes contained in the Track Header Boxes of a 3GP file only
+ specify the spatial relationship of the tracks within the given 3GP
+ file.
+
+ If multiple 3GP files are sent, they require spatial synchronization.
+ For example, for a text and video stream, the positions of the text
+ and video tracks in Figure 18 shall be determined. For this purpose,
+ SMIL [9] MAY be used.
+
+ SMIL assigns regions in the display to each of those files and places
+ the tracks within those regions. Generally, in SMIL, the position of
+ one track (or stream) is expressed relative to another track. This
+ is different from the 3GP file, where the upper left corner is the
+ reference for all translation offsets. Hence, only if the position
+ in SMIL is relative to the video track origin, then this translation
+ offset has the same value as (tx, ty) in the 3GP file.
+
+ Note also that the original track header information is used for each
+ track only within its region, as assigned by SMIL. Therefore, even
+ if SMIL scene description is used, the track header information
+ pieces SHOULD be sent anyway, as they represent the intrinsic media
+ properties. See 3GPP SMIL Language Profile in [27] for details.
+
+7.3. Finding Layout Values in a 3GP File
+
+ In a 3GP file, within the Track Header Box (tkhd):
+
+ o tx, ty: These values specify the translation offset of the
+ (text) track relative to the upper left corner of the video
+ track, if present. They are the second but last and third but
+ last values in the unity matrix; values are fixed-point 16.16
+ values, restricted to be (signed) integers (i.e., the lower 16
+ bits of each value shall be all zeros). Therefore, only the
+ first 16 bits are used for obtaining the value of the media
+ type parameters.
+
+
+
+
+
+Rey & Matsui Standards Track [Page 48]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ o width, height: They have the same name in the tkhd box. All
+ (unsigned) 32 bits are meaningful.
+
+ o layer: All (signed) 16 bits are used.
+
+8. 3GPP Timed Text Media Type
+
+ The media subtype for the 3GPP Timed Text codec is allocated from the
+ standards tree. The top-level media type under which this payload
+ format is registered is 'video'. This registration is done using the
+ template defined in [29] and following RFC 3555 [28].
+
+ The receiver MUST ignore any unrecognized parameter.
+
+ Media type: video
+
+ Media subtype: 3gpp-tt
+
+ Required parameters
+
+ rate:
+ Refer to Section 3 in RFC 4396.
+
+ sver:
+ The parameter "sver" contains a list of supported
+ backwards-compatible versions of the timed text format
+ specification (3GPP TS 26.245) that the sender accepts
+ to receive (and that are the same that it would be
+ willing to send). The first value is the value
+ preferred to receive (or preferred to send). The first
+ value MAY be followed by a comma-separated list of
+ versions that SHOULD be used as alternatives. The order
+ is meaningful, being first the most preferred and last
+ the least preferred. Each entry has the format
+ Zi(xi*256+yi), where "Zi" is the number of the Release
+ and "xi" and "yi" are taken from the 3GPP specification
+ version (i.e., vZi.xi.yi). For example, for 3GPP TS
+ 26.245 v6.0.0, Zi(xi*256+yi)=6(0), the version value is
+ "60". (Note that "60" is the concatenation of the
+ values Zi=6 and (xi*256+yi)=0 and not their product.)
+
+ If no "sver" value is available, for example, when
+ streaming out of a 3GP file, the default value "60",
+ corresponding to the 3GPP Release 6 version of 3GPP TS
+ 26.245, SHALL be used.
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 49]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ Optional parameters:
+
+ tx:
+ This parameter indicates the horizontal translation
+ offset in pixels of the text track with respect to the
+ origin of the video track. This value is the decimal
+ representation of a 16-bit signed integer. Refer to TS
+ 3GPP 26.245 for an illustration of this parameter.
+
+ ty:
+ This parameter indicates the vertical translation offset
+ in pixels of the text track with respect to the origin
+ of the video track. This value is the decimal
+ representation of a 16-bit signed integer. Refer to TS
+ 3GPP 26.245 for an illustration of this parameter.
+
+ layer:
+ This parameter indicates the proximity of the text track
+ to the viewer. More negative values mean closer to the
+ viewer. This parameter has no units. This value is the
+ decimal representation of a 16-bit signed integer.
+
+ tx3g:
+ This parameter MUST be used for conveying sample
+ descriptions out-of-band. It contains a comma-separated
+ list of base64-encoded entries. The entries of this
+ list MAY follow any particular order and the list SHALL
+ NOT be empty. Each entry is the result of running
+ base64 encoding over the concatenation of the (static)
+ SIDX value as an 8-bit unsigned integer and the (static)
+ sample description for that SIDX, in that order. The
+ format of a sample description entry can be found in
+ 3GPP TS 26.245 Release 6 and later releases. All
+ servers and clients MUST understand this parameter and
+ MUST be capable of using the sample description(s)
+ contained in it. Please refer to RFC 3548 [6] for
+ details on the base64 encoding.
+
+ width:
+ This parameter indicates the width in pixels of the text
+ track or area of the text being sent. This value is the
+ decimal representation of a 32-bit unsigned integer.
+ Refer to TS 3GPP 26.245 for an illustration of this
+ parameter.
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 50]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ height:
+ This parameter indicates the height in pixels of the
+ text track being sent. This value is the decimal
+ representation of a 32-bit unsigned integer. Refer to
+ TS 3GPP 26.245 for an illustration of this parameter.
+
+ max-w:
+ This parameter indicates display capabilities. This is
+ the maximum "width" value that the sender of this
+ parameter supports. This value is the decimal
+ representation of a 32-bit unsigned integer.
+
+ max-h:
+ This parameter indicates display capabilities. This is
+ the maximum "height" value that the sender of this
+ parameter supports. This value is the decimal
+ representation of a 32-bit unsigned integer.
+
+ Encoding considerations:
+
+ This media type is framed (see Section 4.8 in [29]) and
+ partially contains binary data.
+
+ Restrictions on usage:
+
+ This media type depends on RTP framing, and hence is only
+ defined for transfer via RTP [3]. Transport within other
+ framing protocols is not defined at this time.
+
+ Security considerations:
+
+ Please refer to Section 11 of RFC 4396.
+
+ Interoperability considerations:
+
+ The 3GPP Timed Text media format and its file storage is
+ specified in Release 6 of 3GPP TS 26.245, "Transparent end-to-
+ end packet switched streaming service (PSS); Timed Text Format
+ (Release 6)". Note also that 3GPP may in future releases
+ specify extensions or updates to the timed text media format in
+ a backwards-compatible way, e.g., new modifier boxes or
+ extensions to the sample descriptions. The payload format
+ defined in RFC 4396 allows for such extensions. For future 3GPP
+ Releases of the Timed Text Format, the parameter "sver" is used
+ to identify the exact specification used.
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 51]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ The defined storage format for 3GPP Timed Text format is the
+ 3GPP File Format (3GP) [30]. 3GP files may be transferred using
+ the media type video/3gpp as registered by RFC 3839 [31]. The
+ 3GPP File Format is a container file that may contain, e.g.,
+ audio and video that may be synchronized with the 3GPP Timed
+ Text.
+
+ Published specification: RFC 4396
+
+ Applications which use this media type:
+
+ Multimedia streaming applications.
+
+ Additional information:
+
+ The 3GPP Timed Text media format is specified in 3GPP TS 26.245,
+ "Transparent end-to-end packet switched streaming service (PSS);
+ Timed Text Format (Release 6)". This document and future
+ extensions to the 3GPP Timed Text format are publicly available
+ at http://www.3gpp.org.
+
+ Magic number(s): None.
+
+ File extension(s): None.
+
+ Macintosh File Type Code(s): None.
+
+ Person & email address to contact for further information:
+
+ Jose Rey, jose.rey@eu.panasonic.com
+ Yoshinori Matsui, matsui.yoshinori@jp.panasonic.com
+ Audio/Video Transport Working Group.
+
+ Intended usage: COMMON
+
+ Authors:
+ Jose Rey
+ Yoshinori Matsui
+
+ Change controller: IETF Audio/Video Transport Working Group delegated
+ from the IESG.
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 52]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+9. SDP Usage
+
+9.1. Mapping to SDP
+
+ The information carried in the media type specification has a
+ specific mapping to fields in SDP [4]. If SDP is used to specify
+ sessions using this payload format, the mapping is done as follows:
+
+ o The media type ("video") goes in the SDP "m=" as the media name.
+
+ m=video <port number> RTP/<RTP profile> <dynamic payload type>
+
+ o The media subtype ("3gpp-tt") and the timestamp clockrate "rate"
+ (the RECOMMENDED 1000 Hz or other value) go in SDP "a=rtpmap" line
+ as the encoding name and rate, respectively:
+
+ a=rtpmap:<payload type> 3gpp-tt/1000
+
+ o The REQUIRED parameter "sver" goes in the SDP "a=fmtp" attribute by
+ copying it directly from the media type string as a semicolon-
+ separated parameter=value pair.
+
+ o The OPTIONAL parameters "tx", "ty", "layer", "tx3g", "width",
+ "height", "max-w" and "max-h" go in the SDP "a=fmtp" attribute by
+ copying them directly from the media type string as a semicolon
+ separated list of parameter=value(s) pairs:
+
+ a=fmtp:<dynamic payload type> <parameter
+ name>=<value>[,<value>][; <parameter name>=<value>]
+
+ o Any parameter unknown to the device that uses the SDP SHALL be
+ ignored. For example, parameters added to the media format in
+ later specifications MAY be copied into the SDP and SHALL be
+ ignored by receivers that do not understand them.
+
+9.2. Parameter Usage in the SDP Offer/Answer Model
+
+ In this section, the meaning of the SDP parameters defined in this
+ document within the Offer/Answer [13] context is explained.
+
+ In unicast, sender and receiver typically negotiate the streams,
+ i.e., which codecs and parameter values are used in the session.
+ This is also possible in multicast to a lesser extent.
+
+ Additionally, the meaning of the parameters MAY vary depending on
+ which direction is used. In the following sections, a
+ "<directionality> offer" means an offer that contains a stream set to
+ <directionality>. <directionality> may take the values sendrecv,
+
+
+
+Rey & Matsui Standards Track [Page 53]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ sendonly, and recvonly. Similar considerations apply for answers.
+ For example, an answer to a sendonly offer is a recvonly answer.
+
+9.2.1. Unicast Usage
+
+ The following types of parameters are used in this payload format:
+
+ 1. Declarative parameters: Offerer and answerer declare the values
+ they will use for the incoming (sendrecv/recvonly) or outgoing
+ (sendonly) stream. Offerer and answerer MAY use different
+ values.
+
+ a. "tx", "ty", and "layer": These are parameters describing
+ where the received text track is placed. Depending on the
+ directionality:
+
+ i. They MUST appear in all sendrecv offers and answers and
+ in all recvonly offers and answers (thus applying to
+ the incoming stream). In the case of sendrecv offers
+ and answers and in recvonly offers, these values SHOULD
+ be used by the sender of the stream unless it has a
+ particular preference, in which case, it MUST make sure
+ that these different values do not corrupt the
+ presentation. For recvonly answers, the answerer MAY
+ accept the proposed values for the incoming stream (in
+ a sendonly offer; see ii. below) or respond with
+ different ones. The offerer MUST use the returned
+ values.
+
+ ii. They MAY appear in sendonly offers and MUST appear in
+ sendonly answers. In sendonly offers, they specify the
+ values that the offerer proposes for sending (see
+ example in Section 9.3). In sendonly answers, these
+ values SHOULD be copied from the corresponding recvonly
+ offer upon accepting the stream, unless a particular
+ preference by the receiver of the stream exists, as
+ explained in the previous point.
+
+ 2. Parameters describing the display capabilities, "max-h" and
+ "max-w", which indicate the maximum dimensions of the text track
+ (text display area) for the incoming stream "tx" and "ty" values
+ (see Figure 18). "max-h" and "max-w" MUST be included in all
+ offers and answers where "tx" and "ty" refer to the incoming
+ stream, thus excluding sendonly offers and answers (see example
+ in Section 9.3), where they SHALL NOT be present.
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 54]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ 3. Parameters describing the sent stream properties, i.e., the
+ sender of the stream decides upon the values of these:
+
+ a. "width" and "height" specify the text track dimensions.
+ They SHALL ALWAYS be present in sendrecv and sendonly
+ offers and answers. For recvonly answers, the answerer
+ MUST include the offered parameter values (if any) verbatim
+ in the answer upon accepting the stream.
+
+ b. "tx3g" contains static sample descriptions. It MAY only be
+ present in sendrecv and sendonly offers and answers. This
+ parameter applies to the stream that offerers or answerers
+ send.
+
+ 4. Negotiable parameters, which MUST be agreed on. This is the
+ case of "sver". This parameter MUST be present in every offer
+ and answer. The answerer SHALL choose one supported value from
+ the offerer's list, or else it MUST remove the stream or reject
+ the session.
+
+ 5. Symmetric parameters: "rate", timestamp clockrate, belongs to
+ this class. Symmetric parameters MUST be echoed verbatim in the
+ answer. Otherwise, the stream MUST be removed or the session
+ rejected.
+
+ The following table summarizes all options:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 55]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ +..---------------------------+----------+----------+----------+
+ | ``--..__ Directionality/ | sendrecv | recvonly | sendonly |
+ + Type of ``--..__ O or A +----------+----------+----------+
+ | Parameter ``--..__ | O/A | O/A | O/A |
+ +--------------+------------``+----------+----------+----------+
+ | Declarative |tx, ty, layer | M/M | M/M | m/M |
+ | | | | | |
+ +--------------+--------------+----------+----------+----------+
+ | Display |max-h, max-w | M/M | M/M | -/- |
+ | Capabilities | | | | |
+ +--------------+--------------+----------+----------+----------+
+ | Stream |height, width | M/M | -/(M) | M/M |
+ | properties |tx3g | m/m | -/- | m/m |
+ | | | | | |
+ +--------------+--------------+----------+----------+----------+
+ | Negotiable |sver | M/M | M/M | M/M |
+ | | | | | |
+ +--------------+--------------+----------+----------+----------+
+ | Symmetric |rate | M/M | M/M | M/M |
+ +--------------+--------------+----------+----------+----------+
+
+ Table 1. Parameter usage in Unicast Offer / Answer.
+
+ KEY:
+ o M means MUST be present.
+ o m means MAY be present (such as proposed values).
+ o (M) or (m) means MUST or MAY, if applicable.
+ o a hyphen ("-") means the parameter MUST NOT be present.
+
+ Other observations regarding parameter usage:
+
+ o Translation and transparency values: In sendonly offers, "tx",
+ "ty", and "layer" indicate proposed values. This is useful for
+ visually composed sessions where the different streams occupy
+ different parts of the display, e.g., a video stream and the
+ captions. These are just suggested values; the peer rendering
+ the text ultimately decides where to place the text track.
+
+ o Text track (area) dimensions, "height" and "width": In the case
+ of sendonly offers, an answerer accepting the offer MUST be
+ prepared to render the stream using these values. If any of
+ these conditions are not met, the stream MUST be removed or the
+ session rejected.
+
+ o Display capabilities, "max-h" and "max-w": An answerer sending a
+ stream SHALL ensure that the "height" and "width" values in the
+ answer are compatible with the offerer's signaled capabilities.
+
+
+
+
+Rey & Matsui Standards Track [Page 56]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ o Version handling via "sver": The idea is that offerer and
+ answerer communicate using the same version. This is achieved by
+ letting the answerer choose from a list of supported versions,
+ "sver". For recvonly streams, the first value in the list is the
+ preferred version to receive. Consequently, for sendonly (and
+ sendrecv) streams, the first value is the one preferred for
+ sending (and receiving). The answerer MUST choose one value and
+ return it in the answer. Upon receiving the answer, the offerer
+ SHALL be prepared to send (sendonly and sendrecv) and receive
+ (recvonly and sendrecv) a stream using that version. If none of
+ the versions in the list is supported, the stream MUST be removed
+ or the session rejected. Note that, if alternative non-
+ compatible versions are offered, then this SHALL be done using
+ different payload types.
+
+9.2.2. Multicast Usage
+
+ In multicast, the parameter usage is similar to the unicast case,
+ except as follows:
+
+ o the parameters "tx", "ty", and "layer" in multicast offers only
+ have meaning for sendrecv and recvonly streams. In order for all
+ clients to have the same vision of the session, they MUST be used
+ symmetrically.
+
+ o for "height", "width", and "tx3g" (for sendrecv and sendonly),
+ multicast offers specify which values of these parameters the
+ participants MUST use for sending. Thus, if the stream is
+ accepted, the answerer MUST also include them verbatim in the
+ answer (also "tx3g", if present).
+
+ o The capability parameters, "max-h" and "max-w", SHALL NOT be used
+ in multicast. If the offered text track should change in size, a
+ new offer SHALL be used instead.
+
+ o Regarding version handling:
+
+ In the case of multicast offers, an answerer MAY accept a multicast
+ offer as long as one of the versions listed in the "sver" is
+ supported. Therefore, if the stream is accepted, the answerer MUST
+ choose its preferred version, but, unlike in unicast, the offerer
+ SHALL NOT change the offered stream to this chosen version because
+ there may be other session participants that do support the newer
+ extensions. Consequently, different session participants may end
+ up using different backwards-compatible media format versions. It
+ is RECOMMENDED that the multicast offer contains a limited number
+ of versions, in order for all participants to have the same view of
+ the session. This is a responsibility of the session creator. If
+
+
+
+Rey & Matsui Standards Track [Page 57]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ none of the offered versions is supported, the stream SHALL be
+ removed or the session rejected. Also in this case, if alternative
+ non-compatible versions are offered, then this SHALL be done using
+ different payload types.
+
+9.3. Offer/Answer Examples
+
+ In these unicast O/A examples, the long lines are wrapped around.
+ Static sample descriptions are shortened for clarity.
+
+ For sendrecv:
+
+ O -> A
+
+ m=video <port> RTP/AVP 98
+ a=rtpmap:98 3gpp-tt/1000
+ a=fmtp:98 tx=100; ty=100; layer=0; height=80; width=100; max-h=120;
+ max-w=160; sver=6256,60; tx3g=81...
+ a=sendrecv
+
+ A -> O
+
+ m=video <port> RTP/AVP 98..
+ a=rtpmap:98 3gpp-tt/1000
+ a=fmtp:98 tx=100; ty=95; layer=0; height=90; width=100; max-h=100;
+ max-w=160; sver=60; tx3g=82...
+ a=sendrecv
+
+ In this example, the offerer is telling the answerer where it will
+ place the received stream and what is the maximum height and width
+ allowable for the stream that it will receive. Also, it tells the
+ answerer the dimensions of the text track for the stream sent and
+ which sample description it shall use. It offers two versions, 6256
+ and 60. The answerer responds with an equivalent set of parameters
+ for the stream it receives. In this case, the answerer's "max-h" and
+ "max-w" are compatible with the offerer's "height" and "width".
+ Otherwise, the answerer would have to remove this stream, and the
+ offerer would have to issue a new offer taking the answerer's
+ capabilities into account. This is possible only if multiple payload
+ types are present in the initial offer so that at least one of them
+ matches the answerer's capabilities as expressed by "max-h" and
+ "max-w" in the negative answer. Note also that the answerer's text
+ box dimensions fit within the maximum values signaled in the offer.
+ Finally, the answerer chooses to use version 60 of the timed text
+ format.
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 58]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ For recvonly:
+
+ Offerer -> Answerer
+
+ m=video <port> RTP/AVP 98
+ a=rtpmap:98 3gpp-tt/1000
+ a=fmtp:98 tx=100; ty=100; layer=0; max-h=120; max-w=160; sver=6256,60
+ a=recvonly
+
+ A -> O
+
+ m=video <port> RTP/AVP 98..
+ a=rtpmap:98 3gpp-tt/1000
+ a=fmtp:98 tx=100; ty=100; layer=0; height=90; width=100; sver=60;
+ tx3g=82...
+ a=sendonly
+
+ In this case, the offer is different from the previous case: It does
+ not include the stream properties "height", "width", and "tx3g". The
+ answerer copies the "tx", "ty", and "layer" values, thus
+ acknowledging these. "max-h" and "max-w" are not present in the
+ answer because the "tx" and "ty" (and "layer") in this special case
+ do not apply to the received stream, but to the sent stream. Also,
+ if offerer and answerer had very different display sizes, it would
+ not be possible to express the answerer's capabilities. In the
+ example above and for an answerer with a 50x50 display, the
+ translation values are already out of range.
+
+ For sendonly:
+
+ O -> A
+
+ m=video <port> RTP/AVP 98
+ a=rtpmap:98 3gpp-tt/1000
+ a=fmtp:98 tx=100; ty=100; layer=0; height=80; width=100;
+ sver=6256,60; tx3g=81...
+ a=sendonly
+
+ A -> O
+
+ m=video <port> RTP/AVP 98..
+ a=rtpmap:98 3gpp-tt/1000
+ a=fmtp:98 tx=100; ty=100; layer=0; height=80; width=100; max-h=100;
+ max-w=160; sver=60
+ a=recvonly
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 59]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ Note that "max-h" and "max-w" are not present in the offer. Also,
+ with this answer, the answerer would accept the offer as is (thus
+ echoing "tx", "ty", "height", "width", and "layer") and additionally
+ inform the offerer about its capabilities: "max-h" and "max-w".
+
+ Another possible answer for this case would be:
+
+ A -> O
+
+ m=video <port> RTP/AVP 98..
+ a=rtpmap:98 3gpp-tt/1000
+ a=fmtp:98 tx=120; ty=105; layer=0; max-h=95; max-w=150; sver=60
+ a=recvonly
+
+ In this case, the answerer does not accept the values offered. The
+ offerer MUST use these values or else remove the stream.
+
+9.4. Parameter Usage outside of Offer/Answer
+
+ SDP may also be employed outside of the Offer/Answer context, for
+ instance for multimedia sessions that are announced through the
+ Session Announcement Protocol (SAP) [14] or streamed through the Real
+ Time Streaming Protocol (RTSP) [15].
+
+ In this case, the receiver of a session description is required to
+ support the parameters and given values for the streams, or else it
+ MUST reject the session. It is the responsibility of the sender (or
+ creator) of the session descriptions to define the session parameters
+ so that the probability of unsuccessful session setup is minimized.
+ This is out of the scope of this document.
+
+10. IANA Considerations
+
+ IANA has registered the media subtype name "3gpp-tt" for the media
+ type "video" as specified in Section 8 of this document.
+
+11. Security Considerations
+
+ RTP packets using the payload format defined in this specification
+ are subject to the security considerations discussed in the RTP
+ specification [3] and any applicable RTP profile, e.g., AVP [17].
+
+ In particular, an attacker may invalidate the current set of active
+ sample descriptions at the client by means of repeating a packet with
+ an old sample description, i.e., replay attack. This would mean that
+ the display of the text would be corrupted, if displayed at all.
+ Another form of attack may consist of sending redundant fragments,
+ whose boundaries do not match the exact boundaries of the originals
+
+
+
+Rey & Matsui Standards Track [Page 60]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ (as indicated by LEN) or fragments that carry different sample
+ lengths (SLEN). This may cause a decoder to crash.
+
+ These types of attack may easily be avoided by using source
+ authentication and integrity protection.
+
+ Additionally, peers in a timed text session may desire to retain
+ privacy in their communication, i.e., confidentiality.
+
+ This payload format does not provide any mechanisms for achieving
+ these. Confidentiality, integrity protection, and authentication
+ have to be solved by a mechanism external to this payload format,
+ e.g., SRTP [10].
+
+12. References
+
+12.1. Normative References
+
+ [1] Transparent end-to-end packet switched streaming service (PSS);
+ Timed Text Format (Release 6), TS 26.245 v 6.0.0, June 2004.
+
+ [2] ISO/IEC 14496-12:2004 Information technology - Coding of audio-
+ visual objects - Part 12: ISO base media file format.
+
+ [3] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson,
+ "RTP: A Transport Protocol for Real-Time Applications", STD 64,
+ RFC 3550, July 2003.
+
+ [4] Handley, M. and V. Jacobson, "SDP: Session Description
+ Protocol", RFC 2327, April 1998.
+
+ [5] Bradner, S., "Key words for use in RFCs to Indicate Requirement
+ Levels", BCP 14, RFC 2119, March 1997.
+
+ [6] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings",
+ RFC 3548, July 2003.
+
+12.2. Informative References
+
+ [7] Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for
+ Generic Forward Error Correction", RFC 2733, December 1999.
+
+ [8] Perkins, C. and O. Hodson, "Options for Repair of Streaming
+ Media", RFC 2354, June 1998.
+
+ [9] W3C, "Synchronised Multimedia Integration Language (SMIL 2.0)",
+ August, 2001.
+
+
+
+
+Rey & Matsui Standards Track [Page 61]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ [10] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
+ Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC
+ 3711, March 2004.
+
+ [11] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. Hakenberg,
+ "RTP Retransmission Payload Format", Work in Progress, September
+ 2005.
+
+ [12] van der Meer, J., Mackie, D., Swaminathan, V., Singer, D., and
+ P. Gentric, "RTP Payload Format for Transport of MPEG-4
+ Elementary Streams", RFC 3640, November 2003.
+
+ [13] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
+ Session Description Protocol (SDP)", RFC 3264, June 2002.
+
+ [14] Handley, M., Perkins, C., and E. Whelan, "Session Announcement
+ Protocol", RFC 2974, October 2000.
+
+ [15] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming
+ Protocol (RTSP)", RFC 2326, April 1998.
+
+ [16] Transparent end-to-end packet switched streaming service (PSS);
+ Protocols and codecs (Release 6), TS 26.234 v 6.1.0, September
+ 2004.
+
+ [17] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video
+ Conferences with Minimal Control", STD 65, RFC 3551, July 2003.
+
+ [18] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD
+ 63, RFC 3629, November 2003.
+
+ [19] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 10646",
+ RFC 2781, February 2000.
+
+ [20] Friedman, T., Caceres, R., and A. Clark, "RTP Control Protocol
+ Extended Reports (RTCP XR)", RFC 3611, November 2003.
+
+ [21] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
+ "Extended RTP Profile for RTCP-based Feedback (RTP/AVPF)", Work
+ in Progress, August 2004.
+
+ [22] Hellstrom, G., "RTP Payload for Text Conversation", RFC 2793,
+ May 2000.
+
+ [23] Hellstrom, G. and P. Jones, "RTP Payload for Text Conversation",
+ RFC 4103, June 2005.
+
+
+
+
+
+Rey & Matsui Standards Track [Page 62]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ [24] ITU-T Recommendation T.140 (1998) - Text conversation protocol
+ for multimedia application, with amendment 1, (2000).
+
+ [25] ISO/IEC 10646-1: (1993), Universal Multiple Octet Coded
+ Character Set.
+
+ [26] ISO/IEC FCD 14496-17 Information technology - Coding of audio-
+ visual objects - Part 17: Streaming text format, Work in
+ progress, June 2004.
+
+ [27] Transparent end-to-end Packet-switched Streaming Service (PSS);
+ 3GPP SMIL language profile, (Release 6), TS 26.246 v 6.0.0, June
+ 2004.
+
+ [28] Casner, S. and P. Hoschka, "MIME Type Registration of RTP
+ Payload Formats", RFC 3555, July 2003.
+
+ [29] Freed, N. and J. Klensin, "Media Type Specifications and
+ Registration Procedures", BCP 13, RFC 4288, December 2005.
+
+ [30] Transparent end-to-end packet switched streaming service (PSS);
+ 3GPP file format (3GP) (Release 6), TS 26.244 V6.3. March 2005.
+
+ [31] Castagno, R. and D. Singer, "MIME Type Registrations for 3rd
+ Generation Partnership Project (3GPP) Multimedia files", RFC
+ 3839, July 2004.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 63]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+13. Basics of the 3GP File Structure
+
+ This section provides a coarse overview of the 3GP file structure,
+ which follows the ISO Base Media file Format [2].
+
+ Each 3GP file consists of "Boxes". In general, a 3GP file contains
+ the File Type Box (ftyp), the Movie Box (moov), and the Media Data
+ Box (mdat). The File Type Box identifies the type and properties of
+ the 3GP file itself. The Movie Box and the Media Data Box, serving
+ as containers, include their own boxes for each media. Boxes start
+ with a header, which indicates both size and type (these fields are
+ called, namely, "size" and "type"). Additionally, each box type may
+ include a number of boxes.
+
+ In the following, only those boxes are mentioned that are useful for
+ the purposes of this payload format.
+
+ The Movie Box (moov) contains one or more Track Boxes (trak), which
+ include information about each track. A Track Box contains, among
+ others, the Track Header Box (tkhd), the Media Header Box (mdhd), and
+ the Media Information Box (minf).
+
+ The Track Header Box specifies the characteristics of a single track,
+ where a track is, in this case, the streamed text during a session.
+ Exactly one Track Header Box is present for a track. It contains
+ information about the track, such as the spatial layout (width and
+ height), the video transformation matrix, and the layer number.
+ Since these pieces of information are essential and static (i.e.,
+ constant) for the duration of the session, they must be sent prior to
+ the transmission of any text samples.
+
+ The Media Header Box contains the "timescale" or number of time units
+ that pass in one second, i.e., cycles per second or Hertz. The Media
+ Information Box includes the Sample Table Box (stbl), which contains
+ all the time and data indexing of the media samples in a track. Using
+ this box, it is possible to locate samples in time and to determine
+ their type, size, container, and offset into that container. Inside
+ the Sample Table Box, we can find the Sample Description Box (stsd,
+ for finding sample descriptions), the Decoding Time to Sample Box
+ (stts, for finding sample duration), the Sample Size Box (stsz), and
+ the Sample to Chunk Box (stsc, for finding the sample description
+ index).
+
+ Finally, the Media Data Box contains the media data itself. In timed
+ text tracks, this box contains text samples. Its equivalent to audio
+ and video is audio and video frames, respectively. The text sample
+ consists of the text length, the text string, and one or several
+ Modifier Boxes. The text length is the size of the text in bytes.
+
+
+
+Rey & Matsui Standards Track [Page 64]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+ The text string is plain text to render. The Modifier Box is
+ information to render in addition to the text, such as color, font,
+ etc.
+
+14. Acknowledgements
+
+ The authors would like to thank Dave Singer, Jan van der Meer, Magnus
+ Westerlund, and Colin Perkins for their comments and suggestions
+ about this document.
+
+ The authors would also like to thank Markus Gebhard for the free and
+ publicly available JavE ASCII Editor (used for the ASCII drawings in
+ this document) and Henrik Levkowetz for the Idnits web service.
+
+Authors' Addresses
+
+ Jose Rey
+ Panasonic R&D Center Germany GmbH
+ Monzastr. 4c
+ D-63225 Langen, Germany
+
+ EMail: jose.rey@eu.panasonic.com
+ Phone: +49-6103-766-134
+ Fax: +49-6103-766-166
+
+
+ Yoshinori Matsui
+ Matsushita Electric Industrial Co., LTD.
+ 1006 Kadoma
+ Kadoma-shi, Osaka, Japan
+
+ EMail: matsui.yoshinori@jp.panasonic.com
+ Phone: +81 6 6900 9689
+ Fax: +81 6 6900 9699
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 65]
+
+RFC 4396 Payload Format for 3GPP Timed Text February 2006
+
+
+Full Copyright Statement
+
+ Copyright (C) The Internet Society (2006).
+
+ This document is subject to the rights, licenses and restrictions
+ contained in BCP 78, and except as set forth therein, the authors
+ retain all their rights.
+
+ This document and the information contained herein are provided on an
+ "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
+ OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
+ ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
+ INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
+ INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
+ WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Intellectual Property
+
+ The IETF takes no position regarding the validity or scope of any
+ Intellectual Property Rights or other rights that might be claimed to
+ pertain to the implementation or use of the technology described in
+ this document or the extent to which any license under such rights
+ might or might not be available; nor does it represent that it has
+ made any independent effort to identify any such rights. Information
+ on the procedures with respect to rights in RFC documents can be
+ found in BCP 78 and BCP 79.
+
+ Copies of IPR disclosures made to the IETF Secretariat and any
+ assurances of licenses to be made available, or the result of an
+ attempt made to obtain a general license or permission for the use of
+ such proprietary rights by implementers or users of this
+ specification can be obtained from the IETF on-line IPR repository at
+ http://www.ietf.org/ipr. The IETF invites any interested party to
+ bring to its attention any copyrights, patents or patent
+ applications, or other proprietary rights that may cover technology
+ that may be required to implement this standard. Please address the
+ information to the IETF at ietf-ipr@ietf.org.
+
+Acknowledgement
+
+ Funding for the RFC Editor function is provided by the IETF
+ Administrative Support Activity (IASA).
+
+
+
+
+
+
+
+
+
+Rey & Matsui Standards Track [Page 66]
+