doc: Add RFC documents

author: Thomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committer: Thomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit: 4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
tree: e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc4396.txt
parent: ea76e11061bda059ae9f9ad130a9895cc85607db (diff)
1 files changed, 3699 insertions, 0 deletions
diff --git a/doc/rfc/rfc4396.txt b/doc/rfc/rfc4396.txt
new file mode 100644
index 0000000..be4f173
--- /dev/null
+++ b/doc/rfc/rfc4396.txt
@@ -0,0 +1,3699 @@
+
+
+
+
+
+
+Network Working Group                                             J. Rey
+Request for Comments: 4396                                     Y. Matsui
+Category: Standards Track                                      Panasonic
+                                                           February 2006
+
+
+                           RTP Payload Format
+       for 3rd Generation Partnership Project (3GPP) Timed Text
+
+Status of This Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2006).
+
+Abstract
+
+   This document specifies an RTP payload format for the transmission of
+   3GPP (3rd Generation Partnership Project) timed text.  3GPP timed
+   text is a time-lined, decorated text media format with defined
+   storage in a 3GP file.  Timed Text can be synchronized with
+   audio/video contents and used in applications such as captioning,
+   titling, and multimedia presentations.  In the following sections,
+   the problems of streaming timed text are addressed, and a payload
+   format for streaming 3GPP timed text over RTP is specified.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                     [Page 1]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+Table of Contents
+
+   1. Introduction ....................................................3
+   2. Motivation, Requirements, and Design Rationale ..................3
+      2.1. Motivation .................................................3
+      2.2. Basic Components of the 3GPP Timed Text Media Format .......4
+      2.3. Requirements ...............................................5
+      2.4. Limitations ................................................6
+      2.5. Design Rationale ...........................................7
+   3. Terminology ....................................................10
+   4. RTP Payload Format for 3GPP Timed Text .........................12
+      4.1. Payload Header Definitions ................................13
+           4.1.1. Common Payload Header Fields .......................15
+           4.1.2. TYPE 1 Header ......................................17
+           4.1.3. TYPE 2 Header ......................................20
+           4.1.4. TYPE 3 Header ......................................23
+           4.1.5. TYPE 4 Header ......................................24
+           4.1.6. TYPE 5 Header ......................................25
+      4.2. Buffering of Sample Descriptions ..........................25
+           4.2.1. Dynamic SIDX Wraparound Mechanism ..................26
+      4.3. Finding Payload Header Values in 3GP Files ................28
+      4.4. Fragmentation of Timed Text Samples .......................31
+      4.5. Reassembling Text Samples at the Receiver .................33
+      4.6. On Aggregate Payloads .....................................35
+      4.7. Payload Examples ..........................................39
+      4.8. Relation to RFC 3640 ......................................43
+      4.9. Relation to RFC 2793 ......................................44
+   5. Resilient Transport ............................................45
+   6. Congestion Control .............................................46
+   7. Scene Description ..............................................47
+      7.1. Text Rendering Position and Composition ...................47
+      7.2. SMIL Usage ................................................48
+      7.3. Finding Layout Values in a 3GP File .......................48
+   8. 3GPP Timed Text Media Type .....................................49
+   9. SDP Usage ......................................................53
+      9.1. Mapping to SDP ............................................53
+      9.2. Parameter Usage in the SDP Offer/Answer Model .............53
+           9.2.1. Unicast Usage ......................................54
+           9.2.2. Multicast Usage ....................................57
+      9.3. Offer/Answer Examples .....................................58
+      9.4. Parameter Usage outside of Offer/Answer ...................60
+   10. IANA Considerations ...........................................60
+   11. Security Considerations .......................................60
+   12. References ....................................................61
+      12.1. Normative References .....................................61
+      12.2. Informative References ...................................61
+   13. Basics of the 3GP File Structure ..............................64
+   14. Acknowledgements ..............................................65
+
+
+
+Rey & Matsui                Standards Track                     [Page 2]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+1.  Introduction
+
+   3GPP timed text is a media format for time-lined, decorated text
+   specified in the 3GPP Technical Specification TS 26.245, "Transparent
+   end-to-end packet switched streaming service (PSS); Timed Text Format
+   (Release 6)" [1].  Besides plain text, the 3GPP timed text format
+   allows the creation of decorated text such as that for karaoke
+   applications, scrolling text for newscasts, or hyperlinked text.
+   These contents may or may not be synchronized with other media, such
+   as audio or video.
+
+   The purpose of this document is to provide a means to stream 3GPP
+   timed text contents using RTP [3].  This includes the streaming of
+   timed text being read out of a (3GP) file, as well as the streaming
+   of timed text generated in real-time, a.k.a. live streaming.
+
+   Section 2 contains the motivation for this document, an overview of
+   the media format, the requirements, and the design rationale.
+   Section 3 defines the terminology used.  Section 4 specifies the
+   payload headers, the fragmentation and re-assembly rules for text
+   samples, the rules for payload aggregation, and the relations of this
+   document to RFC 3640 [12] and RFC 2793 [22].  Section 5 specifies
+   some simple schemes for resilient transport and gives pointers to
+   other possible mechanisms.  Section 6 addresses congestion control.
+   Section 7 specifies scene description.  Section 8 defines the media
+   type.  Section 9 specifies SDP for unicast and multicast sessions,
+   including usage in the Offer/Answer model [13].  Sections 10 and 11
+   address IANA and security considerations.  Section 12 lists
+   references.  Basics of the 3GP File Structure are in Section 13.
+
+2.  Motivation, Requirements, and Design Rationale
+
+2.1.  Motivation
+
+   The 3GPP timed text format was developed for use in the services
+   specified in the 3GPP Transparent End-to-end Packet-switched
+   Streaming Services (3GPP PSS) specification [16].
+
+   As of today, PSS allows downloading 3GPP timed text contents stored
+   in 3GP files.  However, due to the lack of a RTP payload format, it
+   is not possible to stream 3GPP timed text contents over RTP.
+
+   This document specifies such a payload format.
+
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                     [Page 3]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+2.2.  Basic Components of the 3GPP Timed Text Media Format
+
+   Before going into the details of the design, it is necessary to know
+   how the media format is constructed.  We can identify four
+   differentiated functional components: layout information, default
+   formatting, text strings, and decoration.  In the following, we
+   shortly explain these and match them to their designations in a 3GP
+   file:
+
+        o Initial spatial layout information related to the text
+          strings: These are the height and width of the text region
+          where text is displayed, the position of the text region in
+          the display, and the layer or proximity of the text to the
+          user.  In 3GP files, this information is contained in the
+          Track Header Box (3GP file designations are capitalized for
+          clarity).
+
+        o Default settings for formatting and positioning of text: style
+          (font, size, color,...), background color, horizontal and
+          vertical justification, line width, scrolling, etc.  For 3GP
+          files, this corresponds to the Sample Descriptions.
+
+        o The actual text strings: encoded characters using either UTF-8
+          [18] or UTF-16 [19] encoding.
+
+        o The decoration: If some characters have different style,
+          delay, blink, etc., this needs to be indicated.  The
+          decoration is only present in the text samples if it is
+          actually needed.  Otherwise, the default settings as above
+          apply.  In 3GP files, within each Text Sample, the decoration
+          (i.e., Modifier Boxes) is appended to the text strings, if
+          needed.  At the time of writing this payload format, the
+          following modifiers are specified in the 3GPP timed text media
+          format specification [1]:
+
+           - text highlight
+           - highlight color
+           - blinking text
+           - karaoke feature
+           - hyperlink
+           - text delay
+           - text style
+           - positioning of the text box
+           - text wrap indication
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                     [Page 4]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+2.3.  Requirements
+
+   Once the basic components are known, it is necessary to define which
+   requirements the payload format shall fulfill:
+
+     1. It shall enable both live streaming and streaming from a 3GP
+        file.
+
+                Informative note: For the purpose of this document, the
+                term "live streaming" refers to those scenarios where
+                the timed text stream is sent from a live encoder.  Upon
+                reception, the content may or may not be stored in a 3GP
+                file.  Typically, in live streaming applications, the
+                sender encapsulates the timed text content in RTP
+                packets following the guidelines given in this document.
+                At the receiving side, a buffer is used to cancel the
+                network delay and delay jitter.  If receiver and sender
+                support packet loss resilience mechanisms (see Section
+                5), it may also be possible to recover from packet
+                losses.  Note that how sender and receiver actually
+                manage and dimension the buffers is an implementation
+                design choice.
+
+     2. Furthermore, it shall be possible for an RTP receiver using this
+        payload format, and capable of storing in 3GP format, to obtain
+        all necessary information from the RTP packets for storing the
+        received text contents according to the 3GP file format.  This
+        file may or may not be the same as the original file.
+
+                Informative note: The 3GP file format itself is based on
+                the ISO Base Media File Format recommendation [2].
+                Section 13.1 gives some insight into the 3GP file
+                structure.  Further, Sections 4.3 and 7.3 specify where
+                the information needed for filling in payload headers is
+                found in a 3GP file.  For live streaming, appropriate
+                values complying with the format and units described in
+                [1] shall be used.  Where needed, clarifications on
+                appropriate values are given in this document.
+
+     3. It shall enable efficient and resilient transport of timed text
+        contents over RTP.  In particular:
+
+          a. Enable the transmission of the sample descriptions by both
+             out-of-band and in-band means.  Sample descriptions are
+             important information, which potentially apply to several
+             text samples.  These default formatting settings are
+             typically transmitted out-of-band (reliably) once at the
+             initialization phase.  If additional sample descriptions
+
+
+
+Rey & Matsui                Standards Track                     [Page 5]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+             are needed in the course of a session, these may also be
+             sent out-of-band or in-band.  In-band transmission,
+             although unreliable, may be more appropriate for sending
+             sample descriptions if these should be sent frequently, as
+             opposed to establishing an additional communication channel
+             for SDP, for example.  It is also useful in cases where an
+             out-of-band channel may not be available and for live
+             streaming, where contents are not known a priori.  Thus,
+             the payload format shall enable out-of-band and in-band
+             transmission of sample descriptions.  Section 4.1.6
+             specifies a payload header for transmitting sample
+             descriptions in-band.  Section 9 specifies how sample
+             descriptions are mapped to SDP.
+
+          b. Enable the fragmentation of a text sample into several RTP
+             packets in order to cover a wide range of applications and
+             network environments.  In general, fragmentation should be
+             a rare event, given the low bit rates and relatively small
+             text sample sizes.  However, the 3GPP Timed Text media
+             format does allow for larger text samples.  Therefore, the
+             payload format shall take this into account and provide a
+             means for coping with fragmentation and reassembly. Section
+             4.4 deals with fragmentation.
+
+          c. Enable the aggregation of units into an RTP packet for
+             making the transport more efficient.  In a mobile
+             communication environment, a typical text sample size is
+             around 100-200 bytes.  If the available bit rate and the
+             packet size allow it, units should be aggregated into one
+             RTP packet.  Section 4.6 deals with aggregation.
+
+          d. Enable the use of resilient transport mechanisms, such as
+             repetition, retransmission [11], and FEC [7] (see Section
+             5).  For a more general discussion, refer to RFC 2354 [8],
+             which discusses available mechanisms for stream repair.
+
+2.4.  Limitations
+
+     The payload headers have been optimized in size for RTP.  Instead
+     of using 32-bit (S)LEN, SDUR, and SIDX header fields, which would
+     carry many unused bits much of the time, it has been a design
+     choice to reduce the size of these fields.  As a consequence, this
+     payload format has reduced maximum values with respect to sizes and
+     durations of (text) samples and sample descriptions.  These maximum
+     values differ from those allowed in 3GP files, where they are
+     expressed using 32-bit (unsigned) integers.  In some cases,
+
+
+
+
+
+Rey & Matsui                Standards Track                     [Page 6]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+     extension mechanisms are provided to deal with larger values.
+     However, it is noted that the values used here should be enough for
+     the streaming applications targeted.
+
+     The following limitations apply:
+
+     1. The maximum size of text samples carried in RTP packets is
+        restricted to be a 16-bit (unsigned) integer (this includes the
+        text strings and modifiers).  This means a maximum size for the
+        unit would be about 64 Kbytes.  No extension mechanism is
+        provided.
+
+     2. The sample description index values are restricted to be an 8-
+        bit (unsigned) integer.  An extension mechanism is given in
+        Section 4.3.
+
+     3. The text sample duration is restricted to be a 24-bit (unsigned)
+        integer.  This yields a maximum duration at a timestamp
+        clockrate of 1000 Hz of about 4.6 hours.  Nevertheless, an
+        extension mechanism is provided in Section 4.3.
+
+     4. Sample descriptions are also restricted in size: If the size
+        cannot be expressed as a 16-bit (unsigned) integer, the sample
+        description shall not be conveyed.  As in the case of the sample
+        size, no extension mechanism is provided.
+
+     5. A further limitation concerns the UTF-16 encodings supported:
+        Only transport of text strings following big endian byte order
+        is supported.  See Section 4.1.1 for details.
+
+2.5.  Design Rationale
+
+   The following design choices were made:
+
+     1. 'Unit' approach: The payload formats specified in this document
+        follow a simple scheme: a 3-byte common header (Common Payload
+        Header) followed by a specific header for each text sample
+        (fragment) type.  Following these headers, the text sample
+        contents are placed (Section 4.1.1 and following).  This
+        structure is called a 'unit'.
+
+        The following units have been devised to comply with the
+        requirements mentioned in Section 2.3:
+
+          a. A TYPE 1 unit that contains one complete text sample,
+
+          b. A TYPE 2 unit that contains a complete text string or a
+             fragment thereof,
+
+
+
+Rey & Matsui                Standards Track                     [Page 7]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+          c. A TYPE 3 unit that contains the complete modifiers or only
+             the first fragment thereof,
+
+          d. A TYPE 4 unit that contains one modifier fragment other
+             than the first, and
+
+          e. A TYPE 5 unit that contains one sample description.
+
+        This 'unit' approach was motivated by the following reasons:
+
+              1. Allows a simple classification of the text samples and
+                 text sample fragments that can be conveyed by the
+                 payload format.
+
+              2. Enables easy interoperability with RFC 3640 [12].
+                 During the development of this payload format, interest
+                 was shown from MPEG-4 standardization participants in
+                 developing a common payload structure for the transport
+                 of 3GPP Timed Text.  While interoperability is not
+                 strictly necessary for this payload format to work, it
+                 has been pursued in this payload format.  Section 4.8
+                 explains how this is done.
+
+     2. Character count is not implemented.  This payload format does
+        detect lost text samples fragments, but it does not enable an
+        RTP receiver to find out the exact number of text characters
+        lost.  In fact, the fragment size included in the payload
+        headers does not help in finding the number of lost characters
+        because the UTF-8/UTF-16 [18][19] encodings used yield a
+        variable number of bytes per character.
+
+        For finding the exact number of lost characters, an additional
+        field reflecting the character count (and possibly the character
+        offset) upon fragmentation would be required.  This would
+        additionally require that the entity performing fragmentation
+        count the characters included in each text fragment.
+
+        One benefit of having a character count would be that the
+        display application would be able to replace missing characters
+        through some other character representing character loss.  For
+        example:
+
+             If we take the "Some text is lost now" and assume the loss
+             of a packet containing the text in the middle, this could
+             be displayed (with a character count):
+
+             "Some ############now"
+
+
+
+
+Rey & Matsui                Standards Track                     [Page 8]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+             As opposed to:
+
+             "Some #now"
+
+             which is what this payload format enables ("#" indicates a
+             missing character or packet, respectively).
+
+        However, it is the consensus of the working group that for
+        applications such as subtitling applications and multimedia
+        presentations that use this payload format, such partial error
+        correction is not worth the cost of including two additional
+        fields; namely, character count and character offset.  Instead,
+        it is recommended that some more overhead be invested to provide
+        full error correction by protecting the less text sample
+        fragments using the measures outlined in Section 5.
+
+     3. Fragment re-assembly: In order to re-assemble the text samples,
+        offset information is needed.  Instead of a character or byte
+        offset, a single byte, TOTAL/THIS, is used.  These two values
+        indicate the total number and current index of fragments of a
+        text sample.  This is simpler than having a character offset
+        field in each fragment.  Details in Section 4.1.3.
+
+     4. A length field, LEN, is present in the common header fields.
+        While the length in the RTP payload format is not needed by most
+        RTP applications (typically lower layers, like UDP, provide this
+        information), it does ease interoperability with RFC 3640.  This
+        is because the Access Units (AUs) used for carriage of data in
+        RFC 3640 must include a length indication.  Details are in
+        Section 4.8.
+
+     5. The header fields in the specific payload headers (TYPE headers
+        in Sections 4.1.2 to 4.1.6) have been arranged for easy
+        processing on 32-bit machines.  For this reason, the fields SIDX
+        and SDUR are swapped in TYPE 1 unit, compared to the other
+        units.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                     [Page 9]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+3.  Terminology
+
+   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+   document are to be interpreted as described in RFC 2119 [5].
+
+   Furthermore, the following terms are used and have specific meaning
+   within the context of this document:
+
+   text sample or whole text sample
+
+        In the 3GPP Timed Text media format [1], these terms refer to a
+        unit of timed text data as contained in the source (3GP) file.
+        This includes the text string byte count, possibly a Byte Order
+        Mark, the text string and any modifiers that may follow.  Its
+        equivalent in audio/video would be a frame.
+
+        In this document, however, a text sample contains only text
+        strings followed by zero or more modifiers.  This definition of
+        text sample excludes the 16-bit text string byte count and the
+        16-bit Byte Order Mark (BOM) present in 3GP file text samples
+        (see Section 4.3 and Figure 9).  The 16-bit BOM is not
+        transported in RTP, as explained in Section 4.1.1.
+
+   text strings
+
+        The actual text characters encoded either as UTF-8 or UTF-16.
+        When using this payload format, the text string does not contain
+        any byte order mark (BOM).  See Figure 9 for details.
+
+   fragment or text sample fragment
+
+        A fraction of a text sample.  A fragment may contain either text
+        strings or modifier (decoration) contents, but not both at the
+        same time.
+
+   sample contents
+
+        General term to identify timed text data transported when using
+        this payload format.  Sample contents may be one or several text
+        samples, sample descriptions, and sample fragments (note that,
+        as per Section 4.6, there is only one case in which more than
+        one fragment may be included in a payload).
+
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 10]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+   decoration or modifiers
+
+        These terms are used interchangeably throughout the document to
+        denote the contents of the text sample that modify the default
+        text formatting.  Modifiers may, for example, specify different
+        font size for a particular sequence of characters or define
+        karaoke timing for the sample.
+
+   sample description
+
+        Information that is potentially shared by more than one text
+        sample.  In a 3GP file, a sample description is stored in a
+        place where it can be shared.  It contains setup and default
+        information such as scrolling direction, text box position,
+        delay value, default font, background color, etc.
+
+   units or transport units
+
+        The payload headers specified in this document encapsulate text
+        samples, fragments thereof, and sample descriptions by placing a
+        common header and specific payload header (Sections 4.1.1 to
+        4.1.6) before them, thus building what is here called a
+        (transport) unit.
+
+   aggregation or aggregate packet
+
+        The payload of an aggregate (RTP) packet consists of several
+        (transport) units.
+
+   track or stream
+
+        3GP files contain audio/video and text tracks.  This document
+        enables streaming of text tracks using RTP.  Therefore, these
+        terms are used interchangeably in this document in the context
+        of 3GP files.
+
+   Media Header Box / Track Header Box / ...
+
+        The 3GP file format makes use of these structures defined in the
+        ISO Base File Format [2].  When referring to these in this
+        document, initials are capitalized for clarity.
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 11]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+4.  RTP Payload Format for 3GPP Timed Text
+
+   The format of an RTP packet containing 3GPP timed text is shown
+   below:
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |V=2|P|X| CC    |M|    PT       |        sequence number        |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                           timestamp                           |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |           synchronization source (SSRC) identifier            |
+     /+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    | |U|   R   | TYPE|             LEN               |               :
+    | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               :
+   U| :           (variable header fields depending on TYPE           :
+   N| :                                                               :
+   I< +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   T| |                                                               |
+    | :                    SAMPLE CONTENTS                            :
+    | |                                               +-+-+-+-+-+-+-+-+
+    | |                                               |
+     \+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+               Figure 1. 3GPP Timed Text RTP Packet Format
+
+   Marker bit (M): The marker bit SHALL be set to 1 if the RTP packet
+   includes one or more whole text samples or the last fragment of a
+   text sample; otherwise, it is set to zero (0).
+
+   Timestamp: The timestamp MUST indicate the sampling instant of the
+   earliest (or only) unit contained in the RTP packet.  The initial
+   value SHOULD be randomly determined, as specified in RTP [3].
+
+        The timestamp value should provide enough timing resolution for
+        expressing the duration of text samples, for synchronizing text
+        with other media, and for performing RTP Control Protocol (RTCP)
+        measurements such as the interarrival delay jitter or the RTCP
+        Packet Receipt Times Report Block (Section 4.3 of RFC 3611
+        [20]).  This is compliant to RTP, Section 5.1:
+
+             "The resolution of the clock MUST be sufficient for the
+             desired synchronization accuracy and for measuring packet
+             arrival jitter (one tick per video frame is typically not
+             sufficient)".
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 12]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+        The above observation applies to both timed text tracks included
+        in a 3GP file and live streaming sessions.  In the case of a 3GP
+        timed text track, the timestamp clockrate is the value of the
+        "timescale" parameter in the Media Header Box for that text
+        track.  Each track in a 3GP file MAY have its own clockrate as
+        specified in the Media Header Box.  Likewise, live streaming
+        applications SHALL use an appropriate timestamp clockrate.  A
+        default value of 1000 Hz is RECOMMENDED.  Other timestamp
+        clockrates MAY be used.  In this case, the typical behavior here
+        is to match the 3GPP timed text clockrate to that used by an
+        associated audio or video stream.
+
+        In an aggregate payload, units MUST be placed in play-out order,
+        i.e., earliest first in the payload.  If TYPE 1 units are
+        aggregated, the timestamp of the subsequent units MUST be
+        obtained by adding the timed text sample duration of previous
+        samples to the RTP timestamp value.  There are two exceptions to
+        this rule: TYPE 5 units and an aggregate payload containing two
+        fragments of the same text sample.  The details of the timestamp
+        calculation are given in Section 4.6.
+
+        Finally, timestamp clockrates MUST be signaled by out-of-band
+        means at session setup, e.g., using the media type "rate"
+        parameter in SDP.  See Section 9 for details.
+
+   Payload Type (PT): The payload type is set dynamically and sent by
+   out-of-band means.
+
+   The usage of the remaining RTP header fields (namely, V, P, X, CC, SN
+   and SSRC) follows the rules of RTP and the profile in use.
+
+4.1.  Payload Header Definitions
+
+   The (transport) units specified in this document consist of a set of
+   common fields (U, R, TYPE, LEN), followed by specific header fields
+   (TYPES 1-5) and text sample contents.  See Figure 1 and Figure 2.
+
+   In Figure 2, two example RTP packets are depicted.  The first
+   contains an aggregate RTP payload with two complete text samples, and
+   the second contains one text sample fragment.  After each unit header
+   is explained, detailed payload examples follow in Section 4.7.
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 13]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+                                        +----------------------+
+                                        |                      |
+                                        |   RTP Header         |
+                                        |                      |
+                               ---------+----------------------+
+                               |        |                      |
+                               |        |COMMON + TYPE 1 Header|
+                               |        ........................
+                        UNIT 1 -        |                      |
+                               |        |    Text Sample       |
+                               |        |                      |
+                               |-------\........................
+                                -------/|                      |
+                               |        |COMMON + TYPE 1 Header|
+                               |        ........................
+                        UNIT 2 -        |                      |
+                               |        |    Text Sample       |
+                               |        |                      |
+                               |        |                      |
+                               ---------+----------------------+
+
+                                        +----------------------+
+                                        |                      |
+                                        |   RTP Header         |
+                                        |                      |
+                               ---------+----------------------+
+                               |        |  COMMON + TYPE 2     |
+                               |        |    (or 3 or 4) Hdr   |
+                               |        ........................
+                        UNIT 3 -        |                      |
+                               |        | Text Sample Fragment |
+                               |        |                      |
+                               |        |                      |
+                               ---------+----------------------+
+
+                     Figure 2.  Example RTP packets
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 14]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+4.1.1.  Common Payload Header Fields
+
+   The fields common to all payload headers have the following format:
+
+            0                   1                   2
+            0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
+           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+           |U|   R   |TYPE |             LEN               |
+           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+                Figure 3.  Common payload header fields
+
+   Where:
+
+   o U (1 bit) "UTF Transformation flag": This is used to inform RTP
+     receivers whether UTF-8 (U=0) or UTF-16 (U=1) was used to encode
+     the text string.  UTF-16 text strings transported by this payload
+     format MUST be serialized in big endian order, a.k.a. network byte
+     order.
+
+        Informative note: Timed text clients complying with the 3GPP
+        Timed Text format [1] are only required to understand the big
+        endian serialization.  Thus, in order to ease interoperability,
+        the reverse serialization (little endian) is not supported by
+        this payload format.
+
+     For the payload formats defined in this document, the U bit is only
+     used in TYPE 1 and TYPE 2 headers.  Senders MUST set the U bit to
+     zero in TYPE 3, TYPE 4, and TYPE 5 headers.  Consequently,
+     receivers MUST ignore the U bit in TYPE 3, TYPE 4, and TYPE 5
+     headers.
+
+   o R (4 bits) "Reserved bits": for future extensions.  This field MUST
+     be set to zero (0x0) and MUST be ignored by receivers.
+
+   o TYPE (3 bits) "Type Field": This field specifies which specific
+     header fields follow.  The following TYPE values are defined:
+
+        - TYPE 1, for a whole text sample.
+        - TYPE 2, for a text string fragment (without modifiers).
+        - TYPE 3, for a whole modifier box or the first fragment of a
+          modifier box.
+        - TYPE 4, for a modifier fragment other than first.
+        - TYPE 5, for a sample description.  Exactly one header per
+          sample description.
+        - TYPE 0, 6, and 7 are reserved for future extensions.  Note
+          that future extensions are possible, e.g., a unit that
+          explicitly signals the number of characters present in a
+
+
+
+Rey & Matsui                Standards Track                    [Page 15]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+          fragment (see Section 2.5).  In order to guarantee backwards-
+          compatibility, it SHALL be possible that older clients ignore
+          (newer) units they do not understand, without invalidating the
+          timestamp calculation mechanisms or otherwise preventing them
+          from decoding the other units.
+
+   o Finally, the LEN (16 bits) "Length Field": indicates the size (in
+     bytes) of this header field and all the fields following, i.e., the
+     LEN field followed by the unit payload: text strings and modifiers
+     (if any).  This definition only excludes the initial U/R/TYPE byte
+     of the common header.  The LEN field follows network byte order.
+
+     The way in which LEN is obtained when streaming out of a 3GP file
+     depends on the particular unit type.  This is explained for each
+     unit in the sections below.
+
+     For live streaming, both sample length and the LEN value for the
+     current fragment MUST be calculated during the sampling process or
+     during fragmentation.
+
+     In general, LEN may take the following values:
+
+      - TYPE = 1, LEN >= 8
+      - TYPE = 2, LEN > 9
+      - TYPE = 3, LEN > 6
+      - TYPE = 4, LEN > 6
+      - TYPE = 5, LEN > 3
+
+     Receivers MUST discard units that do not comply with these values.
+     However, the RTP header fields and the rest of the units in the
+     payload (if any) are still useful, as guaranteed by the requirement
+     for future extensions above.
+
+     In the following subsections the different payload headers for the
+     values of TYPE are specified.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 16]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+4.1.2.  TYPE 1 Header
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |U|   R   |TYPE |       LEN  (always >=8)       |    SIDX       |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                      SDUR                     |     TLEN      |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |      TLEN     |
+      +-+-+-+-+-+-+-+-+
+
+                    Figure 4.  TYPE 1 Header Format
+
+   This header type is used to transport whole text samples.  This unit
+   should be the most common case, i.e., the text sample should usually
+   be small enough to be transported in one unit without having to
+   separate text strings from modifiers.  In an aggregate (RTP packet)
+   payload containing several text samples, every sample is preceded by
+   its own TYPE 1 header (see Figure 12).
+
+        Informative note: As indicated in Section 3, "Terminology", a
+        text sample is composed of the text strings followed by the
+        modifiers (if any).  This is also how text samples are stored in
+        3GP files.  The separation of a text sample into text strings
+        and modifiers is only needed for large samples (or small
+        available IP MTU sizes; see Section 4.4), and it is accomplished
+        with TYPE 2 and TYPE 3 headers, as explained in the sections
+        below.
+
+   Note also that empty text samples are considered whole text samples,
+   although they do not contain sample contents.  Empty text samples may
+   be used to clear the display or to put an end to samples of unknown
+   duration, for example.  Units without sample contents SHALL have a
+   LEN field value of 8 (0x0008).
+
+   The fields above have the following meaning:
+
+   o U, R, and TYPE, as defined in Section 4.1.1.
+
+   o LEN, in this case, represents the length of the (complete) text
+     sample plus eight (8) bytes of headers.  For finding the length of
+     the text sample in the Sample Size Box of 3GP files, see Section
+     4.3.
+
+   o SIDX (8 bits) "Text Sample Entry Index": This is an index used to
+     identify the sample descriptions.
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 17]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+     The SIDX field is used to find the sample description corresponding
+     to the unit's payload.  There are two types of SIDX values: static
+     and dynamic.
+
+     Static SIDX values are used to identify sample descriptions that
+     MUST be sent out-of-band and MUST remain active during the whole
+     session.  A static SIDX value is unequivocally linked to one
+     particular sample description during the whole session.  Carrying
+     many sample descriptions out-of-band SHOULD be avoided, since these
+     may become large and, ultimately, transport is not the goal of the
+     out-of-band channel.  Thus, this feature is RECOMMENDED for
+     transporting those sample descriptions that provide a set of
+     minimum default format settings.  Static SIDX values MUST fall in
+     the (closed) interval [129,254].
+
+     Dynamic SIDX values are used for sample descriptions sent in-band.
+     Sample descriptions MAY be sent in-band for several reasons:
+     because they are generated in real time, for transport resiliency,
+     or both.  A dynamic SIDX value is unequivocally linked to one
+     particular sample description during the period in which this is
+     active in the session, and it SHALL NOT be modified during that
+     period.  This period MAY be smaller than or equal to the session
+     duration.  This period is not known a priori.  A maximum of 64
+     dynamic simultaneously active SIDX values is allowed at any moment.
+     Dynamic SIDX values MUST fall in the closed interval [0,127].  This
+     should be enough for both recorded content and live streaming
+     applications.  Nevertheless, a wraparound mechanism is provided in
+     Section 4.2.1 to handle streaming sessions where more than 64 SIDX
+     values might be needed.  Servers MAY make use of dynamic sample
+     descriptions.  Clients MUST be able to receive and interpret
+     dynamic sample descriptions.
+
+     Finally, SIDX values 128 and 255 are reserved for future use.
+
+   o SDUR (24 bits) "Text Sample Duration": indicates the sample
+     duration in RTP timestamp units of the text sample.  For this
+     field, a length of 3 bytes is preferred to 2 bytes.  This is
+     because, for a typical clockrate of 1000 Hz, 16 bits would allow
+     for a maximum duration of just 65 seconds, which might be too short
+     for some streams.  On the other hand, 24 bits at 1000 Hz allow for
+     a maximum duration of about 4.6 hours, while for 90 KHz, this value
+     is about 3 minutes.  These values should be enough for streaming
+     applications.  However, if a larger duration is needed, the
+     extension mechanism specified in Section 4.3 SHALL be used.
+
+     Apart from defining the time period during which the text is
+     displayed, the duration field is also used to find the timestamp of
+     subsequent units within the aggregate RTP packet payload (if any).
+
+
+
+Rey & Matsui                Standards Track                    [Page 18]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+     This is explained in Section 4.6.
+
+     Text samples have generally a known duration at the time of
+     transmission.  However, in some cases such as live streaming, the
+     time for which a text piece shall be presented might not be known a
+     priori.  Thus, the value zero SDUR=0 (0x000000) is reserved to
+     signal unknown duration.  The amount of time that a sample of
+     unknown duration is presented is determined by the timestamp of the
+     next sample that shall be displayed at the receiver: Text samples
+     of unknown duration SHALL be displayed until the next text sample
+     becomes active, as indicated by its timestamp.
+
+     The next example illustrates how units of unknown duration MUST be
+     presented.  If no text sample following is available, it is an
+     implementation issue what should be displayed.  For example, a
+     server could send an empty sample to clear the text box.
+
+        Example: Imagine you are in an airport watching the latest news
+        report while you wait for your plane.  Airports are loud, so the
+        news report is transcribed in the lower area of the screen.
+        This area displays two lines of text: the headlines and the
+        words spoken by the news speaker.  As usual, the headlines are
+        shown for a longer time than the rest.  This time is, in
+        principle, unknown to the stream server, which is streaming
+        live.  A headline is just replaced when the next headline is
+        received.
+
+     However, upon storing a text sample with SDUR=0 in a 3GP file, the
+     SDUR value MUST be changed to the effective duration of the text
+     sample, which MUST be always greater than zero (note that the ISO
+     file format [2] explicitly forbids a sample duration of zero).  The
+     effective duration MUST be calculated as the timestamp difference
+     between the current sample (with unknown duration) and the next
+     text sample that is displayed.
+
+     Note that samples of unknown duration SHALL NOT use features, which
+     require knowledge of the duration of the sample up front.  Such
+     features are scrolling and karaoke in [1].  This also applies for
+     future extensions of the Timed Text format.  Furthermore, only
+     sample descriptions (TYPE 5 units) MAY follow units of unknown
+     duration in the same aggregate payload.  Otherwise, it would not be
+     possible to calculate the timestamp of these other units.
+
+     For text contents stored in 3GP files, see Section 4.3 for details
+     on how to extract the duration value.  For live streaming, live
+     encoders SHALL assign appropriate values and units according to [1]
+     and later releases.
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 19]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+   o TLEN (16 bits), "Text String Length", is a byte count of the text
+     string.  The decoder needs the text string length in order to know
+     where the modifiers in the payload start.  TLEN is not present in
+     text string fragments (TYPE 2) since it can be deductively
+     calculated from the LEN values of each fragment.
+
+     The TLEN value is obtained from the text samples as contained in
+     3GP files.  Refer to Section 4.3.  For live content, the TLEN MUST
+     be obtained during the sampling process.
+
+   o Finally, the actual text sample is placed after the TLEN field.  As
+     defined in Section 3, a text sample consists of a string of
+     characters encoded using either UTF-8 or UTF-16, followed by zero
+     or more modifiers.  Note also that no BOM and no byte count are
+     included in the strings carried in the payload (as opposed to text
+     samples stored in 3GP files [1]).
+
+4.1.3.  TYPE 2 Header
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |U|   R   |TYPE |          LEN( always >9)      | TOTAL | THIS  |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                    SDUR                       |    SIDX       |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |               SLEN            |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+                      Figure 5.  TYPE 2 Header Format
+
+   This header type is used to transport either a whole text string or a
+   fragment of it.  TYPE 2 units SHALL NOT contain modifiers.  In
+   detail:
+
+   o U, R, and TYPE, as defined in Section 4.1.1.
+
+   o SIDX and SDUR, as defined in Section 4.1.2.
+
+        Note that the U, SIDX, and SDUR fields are meaningful since
+        partial text strings can also be displayed.
+
+   o The LEN field (16 bits) indicates the length of the text string
+     fragment plus nine (9) bytes of headers.  Its value is calculated
+     upon fragmentation.  LEN MUST always be greater than nine (0x0009).
+     Otherwise, the unit MUST be discarded.
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 20]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+     According to the guidelines in Section 4.4, text strings MUST be
+     split at character boundaries for allowing the display of text
+     fragments.  Therefore, a text fragment MUST contain at least one
+     character in either UTF-8 or UTF-16.  Actually, this is just a
+     formalism since by observing the guidelines, much larger fragments
+     should be created.
+
+     Note also that TYPE 2 units do not contain an explicit text string
+     length, TLEN (see TYPE 1).  This is because TYPE 2 units do not
+     contain any modifiers after the text string.  If needed, the length
+     of the received string can be obtained using the LEN values of the
+     TYPE 2 units.
+
+   o The SLEN field (16 bits) indicates the size (in bytes) of the
+     original (whole) text sample to which this fragment belongs.  This
+     length comprises the text string plus any modifier boxes present
+     (and includes neither the byte order mark nor the text string
+     length as mentioned in Section 3, "Terminology").
+
+     Regarding the text sample length: Timed text samples are not
+     generated at regular intervals, nor is there a default sample size.
+     If 3GP files are streamed, the length of the text samples is
+     calculated beforehand and included in the track itself, while for
+     live encoding it is the real time encoder that SHALL choose an
+     appropriate size for each text sample.  In this case, the amount of
+     text 'captured' in a sample depends on the text source and the
+     particular application (see examples below).  Samples may, e.g., be
+     tailored to match the packet MTU as closely as possible or to
+     provide a given redundancy for the available bit rate.  The
+     encoding application MUST also take into account the delay
+     constraints of the real-time session and assess whether FEC,
+     retransmission, or other similar techniques are reasonable options
+     for stream repair.
+
+     The following examples shall illustrate how a real-time encoder may
+     choose its settings to adapt to the scenario constraints.
+
+          Example: Imagine a newscast scenario, where the spoken news is
+          transcribed and synchronized with the image and voice of the
+          reporter.  We assume that the news speaker talks at an average
+          speed of 5 words per second with an average word length of 5
+          characters plus one space per word, i.e., 30 characters per
+          second.  We assume an available IP MTU of 576 bytes and an
+          available bitrate of 576*8 bits per second = 4.6 Kbps.  We
+          assume each character can be encoded using 2 bytes in UTF-16.
+          In this scenario, several constraints may apply; for example:
+          available IP MTU, available bandwidth, allowable delay, and
+          required redundancy.  If the target were to minimize the
+
+
+
+Rey & Matsui                Standards Track                    [Page 21]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+          packet overhead, a text sample covering 8 seconds of text
+          would be closest to the IP MTU:
+
+       IP/UDP/RTP/TYPE1 Header + (8-second text sample)
+     = 20 + 8 + 12 + 8 + (~6 chars/word * 5 word/s * 8 s * 2 chars/word)
+     = 528 bytes < 576 bytes
+
+    For other scenarios, like lossy networks, it may happen that just
+    one packet per sample is too low a redundancy.  In this case, a
+    choice could be that the encoder 'collects' text every second, thus
+    yielding text samples (TYPE 1 units) of 68 bytes, TYPE 1 header
+    included.  We can, e.g., include three contiguous text samples in
+    one RTP payload: the current and last two text samples (see below).
+    This accounts to a total IP packet size of 20 + 8 + 12 + 3*(8 + 60)
+    = 244 bytes.  Now, with the same available bitrate of 4.6 Kbps,
+    these 244-byte packets can be sent redundantly up two times per
+    second:
+
+          RTP payload (1,2,3)(1,2,3) (2,3,4)(2,3,4) (3,4,5)(3,4,5) ...
+          Time:       <----1s------> <----1s------> <-----1s-----> ...
+
+          This means that each text sample is sent at least six times,
+          which should provide enough redundancy.  Although not as
+          bandwidth efficient (488*8 < 528*8  < 576*8 bps) as the
+          previous packetization, this option increases the stream
+          redundancy while still meeting the delay and bandwidth
+          constraints.
+
+          Another example would be a user sending timed text from a
+          type-in area in the display.  In this case, the text sample is
+          created as soon as the user clicks the 'send' button.
+          Depending on the packet length, fragmentation may be needed.
+
+          In a video conferencing application, text is synchronized with
+          audio and video.  Thus, the text samples shall be displayed
+          long enough to be read by a human, shall fit in the video
+          screen, and shall 'capture' the audio contents rendered during
+          the time the corresponding video and audio is rendered.
+
+     For stored content, see Section 4.3 for details on how to find the
+     SLEN value in a 3GP file.  For live content, the SLEN MUST be
+     obtained during the sampling process.
+
+     Finally, note that clients MAY use SLEN to buffer space for the
+     remaining fragments of a text sample.
+
+   o The fields TOTAL (4 bits) and THIS (4 bits) indicate the total
+     number of fragments in which the original text sample (i.e., the
+
+
+
+Rey & Matsui                Standards Track                    [Page 22]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+     text string and its modifiers) has been fragmented and which order
+     occupies the current fragment in that sequence, respectively.  Note
+     that the sequence number alone cannot replace the functionality of
+     the THIS field, since packets (and fragments) may be repeated,
+     e.g., as in repeated transmission (see Section 5).  Thus, an
+     indication for "fragment offset" is needed.
+
+     The usual "byte offset" field is not used here for two reasons: a)
+     it would take one more byte and b) it does not provide any
+     information on the character offset.  UTF-8/UTF-16 text strings
+     have, in general, a variable character length ranging from 1 to 6
+     bytes.  Therefore, the TOTAL/THIS solution is preferred.  It could
+     also be argued that the LEN and SLEN fields be used for this
+     purpose, but while they would provide information about the
+     completeness of the text sample, they do not specify the order of
+     the fragments.
+
+     In all cases (TYPEs 2, 3 and 4), if the value of THIS is greater
+     than TOTAL or if TOTAL equals zero (0x0), the fragment SHALL be
+     discarded.
+
+   o Finally, the sample contents following the SLEN field consist of a
+     fragment of the UTF-8/UTF-16 character string; no modifiers follow.
+
+4.1.4.  TYPE 3 Header
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |U|   R   |TYPE |        LEN( always >6)        |TOTAL  |  THIS |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                      SDUR                     |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+                      Figure 6.  TYPE 3 Header Format
+
+   This header type is used to transport either the entire modifier
+   contents present in a text sample or just the first fragment of them.
+   This depends on whether the modifier boxes fit in the current RTP
+   payload.
+
+   If a text sample containing modifiers is fragmented, this header MUST
+   be used to transport the first fragment or, if possible, the complete
+   modifiers.
+
+   In detail:
+
+   o The U, R, and TYPE fields are defined as in Section 4.1.1.
+
+
+
+Rey & Matsui                Standards Track                    [Page 23]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+   o LEN indicates the length of the modifier contents.  Its value is
+     obtained upon fragmentation.  Additionally, the LEN field MUST be
+     greater than six (0x0006).  Otherwise, the unit MUST be discarded.
+
+   o The TOTAL/THIS field has the same meaning as for TYPE 2.
+
+     For TYPE 3 units containing the last (trailing) modifier fragment,
+     the value of TOTAL MUST be equal to that of THIS (TOTAL=THIS).  In
+     addition, TOTAL=THIS MUST be greater than one, because the total
+     number of fragments of a text sample is logically always larger
+     than one.
+
+     Otherwise, if TOTAL is different from THIS in a TYPE 3 unit, this
+     means that the unit contains the first fragment of the modifiers.
+
+   o The SDUR has the same definition for TYPE 1.  Since the fragments
+     are always transported in own RTP packets, this field is only
+     needed to know how long this fragment is valid.  This may, e.g., be
+     used to determine how long it should be kept in the display buffer.
+
+   Note that the SLEN and SIDX fields are not present in TYPE 3 unit
+   headers.  This is because a) these fragments do not contain text
+   strings and b) these types of fragments are applied over text string
+   fragments, which already contain this information.
+
+4.1.5.  TYPE 4 Header
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |U|   R   |TYPE |        LEN( always >6)        |TOTAL  |  THIS |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                      SDUR                     |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+                      Figure 7.  TYPE 4 Header Format
+
+   This header type is placed before modifier fragments, other than the
+   first one.
+
+   The U, R, and TYPE fields are used as per Section 4.1.1.
+
+   LEN indicates as for TYPE 3 the length of the modifier contents and
+   SHALL also be obtained upon fragmentation.  The LEN field MUST be
+   greater than six (0x0006).  Otherwise, the unit MUST be discarded.
+
+   TOTAL/THIS is used as in TYPE 2.
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 24]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+   The SDUR field is defined as in TYPE 1.  The reasoning behind the
+   absence of SLEN and SIDX is the same as in TYPE 3 units.
+
+4.1.6.  TYPE 5 Header
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |U|   R   |TYPE |      LEN( always >3)          |   SIDX        |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+                      Figure 8.  TYPE 5 Header Format
+
+   This header type is used to transport (dynamic) sample descriptions.
+   Every sample description MUST have its own TYPE 5 header.
+
+   The U, R, and TYPE fields are used as per Section 4.1.1.
+
+   The LEN field indicates the length of the sample description, plus
+   three units accounting for the SIDX and LEN field itself.  Thus, this
+   field MUST be greater than three (0x0003).  Otherwise, the unit MUST
+   be discarded.
+
+   If the sample is streamed from a 3GP file, the length of the sample
+   description contents (i.e., what comes after SIDX in the unit itself)
+   is obtained from the file (see Section 4.3).
+
+   The SIDX field contains a dynamic SIDX value assigned to the sample
+   description carried as sample content of this unit.  As only dynamic
+   sample descriptions are carried using TYPE 5, the possible SIDX
+   values are in the (closed) interval [0,127].
+
+   Senders MAY make use of TYPE 5 units.  All receivers MUST implement
+   support for TYPE 5 units, since it adds minimum complexity and may
+   increase the robustness of the streaming session.
+
+   The next section specifies how SIDX values are calculated.
+
+4.2.  Buffering of Sample Descriptions
+
+   The buffering of sample descriptions is a matter of the client's
+   timed text codec implementation.  In order to work properly, this
+   payload format requires that:
+
+     o Static sample descriptions MUST be buffered at the client, at
+       least, for the duration of the session.
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 25]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+     o If dynamic sample descriptions are used, their buffering and
+       update of the SIDX values MUST follow the mechanism described in
+       the next section.
+
+4.2.1.  Dynamic SIDX Wraparound Mechanism
+
+   The use of dynamic sample descriptions by senders is OPTIONAL.
+   However, if they are used, senders MUST implement this mechanism.
+   Receivers MUST always implement it.
+
+   Dynamic SIDX values remain active either during the entire duration
+   of the session (if used just once) or in different intervals of it
+   (if used once or more).
+
+        Note: In the following, SIDX means dynamic SIDX.
+
+   For choosing the wraparound mechanism, the following rationale was
+   used: There are 128 dynamic SIDX values possible, [0..127].  If one
+   chooses to allow a maximum of 127 to be used as dynamic SIDXs, then
+   any reordered packet with a new sample description would make the
+   mechanism fail.  For example, if the last packet received is SIDX=5,
+   then all 127 values except SIDX=6 would be "active".  Now, if a
+   reordered packet arrives with a new description, SIDX=9, it will be
+   mistakenly discarded, because the SIDX=9 is, at that moment, marked
+   as "active" and active sample descriptions shall not be re-written.
+   Therefore, a "guard interval" is introduced.  This guard interval
+   reduces the number of active SIDXs at any point in time to 64.
+   Although most timed text applications will probably need less than 64
+   sample descriptions during a session (in total), a wraparound
+   mechanism to handle the need for more is described here.
+
+   Thereby, a sliding window of 64 active SIDX values is used.  Values
+   within the window are "active"; all others are marked "inactive".  An
+   SIDX value becomes active if at least one sample description
+   identified by that SIDX has been received.  Since sample descriptions
+   MAY be sent redundantly, it is possible that a client receives a
+   given SIDX several times.  However, active sample descriptions SHALL
+   NOT be overwritten: The receiver SHALL ignore redundant sample
+   descriptions and it MUST use the already cached copy.  The "guard
+   interval" of (64) inactive values ensures that the correct
+   association SIDX <-> sample description is always used.
+
+        Informative note: As for the "guard interval" value itself, 64
+        as 128/2 was considered simple enough while still meeting the
+        expected maximum number of sample descriptions.  Besides that,
+        there's no other motivation for choosing 64 or a different
+        value.
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 26]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+   The following algorithm is used to buffer dynamic sample descriptions
+   and to maintain the dynamic SIDX values:
+
+   Let X be the last SIDX received that updated the range of active
+   sample descriptions.  Let Y be a value within the allowed range for
+   dynamic SIDX: [0,127], and different from X.  Let Z be the SIDX of
+   the last received sample description.  Then:
+
+     1. Initialize all dynamic SIDX values as inactive.  For stored
+        contents, read the sample description index in the Sample to
+        Chunk box ("stsc") for that sample.  For live streaming, the
+        first value MAY be zero or any other value in the interval
+        above.  Go to step 2.
+
+     2. First, in-band sample description with SIDX=Z is received and
+        stored; set X=Z.  Go to step 3.
+
+     3. Any SIDX within the interval [X+1 modulo(128), X+64 modulo(128)]
+        is marked as inactive, and any corresponding sample description
+        is deleted.  Any SIDX within the interval [X+65 modulo(128), X]
+        is set active.  Go to step 4 (wait state).
+
+     4. Wait for next sample description.  Once the client is
+        initialized, the interval of active SIDX values MUST change
+        whenever a sample description with an SIDX value in the inactive
+        set is received.  That is, upon reception of a sample
+        description with SIDX=Z, do the following:
+
+        a. If Z is in the (closed) interval [X+1 modulo(128), X+64
+           modulo(128)] then set X=Z, store the sample description, and
+           go to step 3.
+
+        b. Else, Z must be in the interval [X+65 modulo(128), X], thus:
+
+            i. If SIDX=Z is not stored, then store the sample
+               description. Go to beginning of step 4 (wait state).
+           ii. Else, go to the beginning of step 4 (wait state).
+
+        Informative note: It is allowed that any value of SIDX=X be sent
+        in the interval [0,127].  For example, if [64..127] is the
+        current active set and SIDX=0 is sent, a new sample description
+        is defined (0) and an old one deleted (64); thus [65..127] and
+        [0] are active.  Similarly, one could now send SIDX=64, thus
+        inverting the active and inactive sets.
+
+   Example:
+        If X=4, any SIDX in the interval [5,68] is inactive.  Active
+        SIDX values are in the complementary interval [69,127] plus
+
+
+
+Rey & Matsui                Standards Track                    [Page 27]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+        [0,4].  For example, if the client receives a SIDX=6, then the
+        active interval is now different: [0,6] plus [71,127].  If the
+        received SIDX is in the current active interval, no change SHALL
+        be applied.
+
+4.3.  Finding Payload Header Values in 3GP Files
+
+   For the purpose of streaming timed text contents, some values in the
+   boxes contained in a 3GP file are mapped to fields of this payload
+   header.  This section explains where to find those values.
+
+   Additionally, for the duration and sample description indexes,
+   extension mechanisms are provided.  All senders MUST implement the
+   extension mechanisms described herein.
+
+   If the file is streamed out of a 3GP file, the following guidelines
+   SHALL be followed.
+
+        Note: All fields in the objects (boxes) of a 3GP file are found
+        in network byte order.
+
+   Information obtained from the Sample Table Box (stbl):
+
+        o Sample Descriptions and Sample Description length: The Sample
+          Description box (stsd, inside the stbl) contains the sample
+          descriptions.  For timed text media, each element of stsd is a
+          timed text sample entry (type "tx3g").
+
+          The (unsigned) 32 bits of the "size" field in the stsd box
+          represent the length (in bytes) of the sample description, as
+          carried in TYPE 5 units.  On the other hand, the LEN field of
+          TYPE 5 units is restricted to 16 bits.  Therefore, if the
+          value of "size" is greater than (2^16-1-3)[bytes], then the
+          sample description SHALL NOT be streamed with this payload
+          format.  There is no extension mechanism defined in this case,
+          since fragmentation of sample descriptions is not defined
+          (sample descriptions are typically up to some 200 bytes in
+          size).  Note: The three (3) accounts for the TYPE 5 header
+          fields included in the LEN value.
+
+        o SDUR from the Decoding Time to Sample Box (stts).  The
+          (unsigned) 32 bits of the "sample delta" field are used for
+          calculating SDUR.  However, since the SDUR field is only 3
+          bytes long, text samples with duration values larger than
+          (2^24-1)/(timestamp clockrate)[seconds] cannot be streamed
+          directly.  The solution is simple: Copies of the corresponding
+          text sample SHALL be sent.  Thereby, the timestamp and
+          duration values SHALL be adjusted so that a continuous display
+
+
+
+Rey & Matsui                Standards Track                    [Page 28]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+          is guaranteed as if just one sample would have been sent.
+          That is, a sample with timestamp TS and duration SDUR can be
+          sent as two samples having timestamps TS1 and TS2 and
+          durations SDUR1 and SDUR2, such that TS1=TS, TS2=TS1+SDUR1,
+          and SDUR=SDUR1+SDUR2.
+
+        o Text sample length from the Sample Size Box (stsz).  The
+          (unsigned) 32 bits of the "sample size" or "entry size" (one
+          of them, depending on whether the sample size is fixed or
+          variable) indicate the length (in bytes) of the 3GP text
+          sample.  For obtaining the length of the (actual) streamed
+          text sample, the lengths of the text string byte count (2
+          bytes) and, in case of UTF-16 strings, the length the BOM
+          (also 2 bytes) SHALL be deducted.  This is illustrated in
+          Figure 9.
+
+          Text Sample according to 3GPP TS 26.245
+
+                               TEXT SAMPLE (length=stsz)
+                 .--------------------------------------------------.
+                /                                                    \
+                               TEXT STRING  (length=TBC)
+                    .------------------------------------.
+                   /                                      \
+                TBC BOM                                     MODIFIERS
+               +---+---+----------------------------------+-----------+
+                                     ||
+                                     ||    TBC BOM  -> TLEN  field
+                                     ||   +---+---+    U bit
+                                     ||
+                                     \/
+
+          Text Sample according to this Payload Format
+
+                                 TEXT SAMPLE (length=SLEN w/o TBC,BOM)
+                        .--------------------------------------------.
+                       /                                              \
+                                     TEXT STRING (length=TLEN)
+                        .--------------------------------.
+                       /                                  \
+                                    TEXT STRING             MODIFIERS
+                       +----------------------------------+-----------+
+
+              KEY:
+              TBC = Text string Byte Count
+              BOM = Byte Order Mark
+
+                    Figure 9.  Text sample composition
+
+
+
+Rey & Matsui                Standards Track                    [Page 29]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+          Moreover, since the LEN field in TYPE 1 unit header is 16 bits
+          long, larger text sample sizes than (2^16-1-8) [bytes] SHALL
+          NOT be streamed.  Also, in this case, no extension mechanism
+          is defined.  This is because this maximum is considered enough
+          for the targeted streaming applications. (Note: The eight (8)
+          accounts for the TYPE 1 header fields included in the LEN
+          value).
+
+        o SIDX from the Sample to Chunk Box (stsc): The stsc Box is used
+          to find samples and their corresponding sample descriptions.
+          These are referenced by the "sample description index", a
+          32-bit (unsigned) integer.  If possible, these indices may be
+          directly mapped to the SIDX field.  However, there are several
+          cases where this may not be possible:
+
+                  a) The total number of indices used is greater than
+               the number of indices available, i.e., if the static
+               sample descriptions are more than 127 or the dynamic ones
+               are more than 64.
+
+                  b) The original SIDX value ranges do not fit in the
+               allowed ranges for static (129-254) or dynamic (0-127)
+               values.
+
+          Therefore, when assigning SIDX values to the sample
+          descriptions, the following guidelines are provided:
+
+          o    Static sample descriptions can simply be assigned
+               consecutive values within the range 129-254 (closed
+               interval).  This range should be well enough for static
+               sample descriptions.
+
+          o    As for dynamic sample descriptions:
+
+                  a) Streams that use less than 64 dynamic sample
+               descriptions SHOULD use consecutive values for SIDX
+               anywhere in the range 0-127 (closed interval).
+
+                  b) For streams with more than 64 sample descriptions,
+               the SIDX values MUST be assigned in usage order, and if
+               any sample description shall be used after it has been
+               set inactive, it will need to be re-sent and assigned a
+               new SIDX value (according to the algorithm in Section
+               4.2.1).
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 30]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+   Information obtained from the Media Data Box:
+
+        o Text strings, TLEN, U bit, and modifiers from the Media Data
+          Box (mdat).  Text strings, 16-bit text string byte count, Byte
+          Order Mark (BOM, indicating UTF encoding), and modifier boxes
+          can be found here.
+
+          For TYPE 1 units, the value of TLEN is extracted from the text
+          string byte count that precedes the text string in the text
+          sample, as stored in the 3GP file.  If UTF-16 encoding is
+          used, two (2) more bytes have to be deducted from this byte
+          count beforehand, in order to exclude the BOM.  See Figure 9.
+
+4.4.  Fragmentation of Timed Text Samples
+
+   This section explains why text samples may have to be fragmented and
+   discusses some of the possible approaches to doing it.  A solution is
+   proposed together with rules and recommendations for fragmenting and
+   transporting text samples.
+
+   3GPP Timed Text applications are expected to operate at low bitrates.
+   This fact, added to the small size of timed text samples (typically
+   one or two hundred bytes) makes fragmentation of text samples a rare
+   event.  Samples should usually fit into the MTU size of the used
+   network path.
+
+   Nevertheless, some text strings (e.g., ending roll in a movie) and
+   some modifier boxes (i.e., for hyperlinks, for karaoke, or for
+   styles) may become large.  This may also apply for future modifier
+   boxes.  In such cases, the first option to consider is whether it is
+   possible to adjust the encoding (e.g., the size of sample) in such a
+   way that fragmentation is avoided.  If it is, this is preferred to
+   fragmentation and SHOULD be done.
+
+   Otherwise, if this is not possible or other constraints prevent it,
+   fragmentation MAY be used, and the basic guidelines given in this
+   document MUST be followed:
+
+   o It is RECOMMENDED that text samples be fragmented as seldom as
+     possible, i.e., the least possible number of fragments is created
+     out of a text sample.
+
+   o If there is some bitrate and free space in the payload available,
+     sample descriptions (if at hand) SHOULD be aggregated.
+
+   o Text strings MUST split at character boundaries; see TYPE 2 header.
+     Otherwise, it is not possible to display the text contents of a
+     fragment if a previous fragment was lost.  As a consequence, text
+
+
+
+Rey & Matsui                Standards Track                    [Page 31]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+     string fragmentation requires knowledge of the UTF-8/UTF-16
+     encoding formats to determine character boundaries.
+
+   o Unlike text strings, the modifier boxes are NOT REQUIRED to be
+     split at meaningful boundaries.  However, it is RECOMMENDED that
+     this be done whenever possible.  This decreases the effects of
+     packet loss.  This payload format does not ensure that partially
+     received modifiers are applied to text strings.  If only part of
+     the modifiers is received, it is an application issue how to deal
+     with these, i.e., whether or not to use them.
+
+        Informative note: Ensuring that partially received modifiers can
+        be applied to text strings in all cases (for all modifier types
+        and for all fragment loss constellations) would place additional
+        requirements on the payload format.  In particular, this would
+        require that: a) senders understand the semantics of the
+        modifier boxes and b) specific fragment headers for each of the
+        modifier boxes are defined, in addition to the payload formats
+        defined below.  Understanding the modifiers semantics means
+        knowing, e.g., where each modifier starts and ends, which text
+        fragments are affected, which modifiers may or may not be split,
+        or what the fields indicate.  This is necessary to be able to
+        split the modifiers in such a way that each fragment can be
+        applied independently of previous packet losses.  This would
+        require a more intelligent fragmentation entity and more complex
+        headers.  Given the low probability of fragmentation and the
+        desire to keep the requirements low, it does not seem reasonable
+        to specify such modifier box specific headers.
+
+   o Modifier and text string fragments SHOULD be protected against
+     packet losses, i.e., using FEC [7], retransmission [11], repetition
+     (Section 5), or an equivalent technique.  This minimizes the
+     effects of packet loss.
+
+   o An additional requirement when fragmenting text samples is that the
+     start of the modifiers MUST be indicated using the payload header
+     defined for that purpose, i.e., a TYPE 3 unit MUST be used (see
+     Section 4.1.4).  This enables a receiver to detect the start of the
+     modifiers as long as there are not two or more consecutive packet
+     losses.
+
+   o Finally, sample descriptions SHALL NOT be fragmented because they
+     contain important information that may affect several text samples.
+
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 32]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+4.5.  Reassembling Text Samples at the Receiver
+
+   The payload headers defined in this document allow reassembling
+   fragmented text samples.  For this purpose, the standard RTP
+   timestamp, the duration field (SDUR), and the fields TOTAL/THIS in
+   the payload headers are used.
+
+   Units that belong to the same text sample MUST have the same
+   timestamp.  TYPE 5 units do not comply with this rule since they are
+   not part of any particular text sample.
+
+   The process for collecting the different fragments (units) of a text
+   sample is as follows:
+
+     1. Search for units having the same timestamp value, i.e., units
+        that belong to the same text sample or sample descriptions that
+        shall become available at that time instant.  If several units
+        of the same sample are repeated, only one of them SHALL be used.
+        Repeated units are those that have the same timestamp and the
+        same values for TOTAL/THIS.
+
+                Note that, as mentioned in Section 4.1.1, the receiver
+                SHALL ignore units with unrecognized TYPE value.
+                However, the RTP header fields and the rest of the units
+                (if any) in the payload are still useful.
+
+     2. Check within this set whether any of the units from the text
+        sample is missing.  This is done using the TOTAL and THIS
+        fields; the TOTAL field indicates how many fragments were
+        created out of the text sample, and the THIS field indicates the
+        position of this fragment in the text sample.  As result of this
+        operation, two outcomes are possible:
+
+          a. No fragment is missing.  Then, the THIS field SHALL be used
+             to order the fragments and reassemble the text sample
+             before forwarding it to the decoding application.  Special
+             care SHALL be taken when reassembling the text string as
+             indicated in bullet 4 below.
+
+          b. One or more fragments are missing: Check whether this
+             fragment belongs to the text string or to the modifiers.
+             TYPE 2 units identify text string fragments, and TYPE 3 and
+             4 identify modifier fragments:
+
+              i. If the fragment or fragments missing belong to the text
+                 string and the modifiers were received complete, then
+                 the received text characters may, at least, be
+                 displayed as plain text.  Some modifiers may only be
+
+
+
+Rey & Matsui                Standards Track                    [Page 33]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+                 applied as long as it is possible to identify the
+                 character numbers, e.g., if only the last text string
+                 fragment is lost.  This is the case for modifiers
+                 defining specific font styles ('styl'), highlighted
+                 characters ('hlit'), karaoke feature ('krok'), and
+                 blinking characters ('blnk').  Other modifiers such as
+                 'dlay' or 'tbox' can be applied without the knowledge
+                 of the character number.  It is an application issue to
+                 decide whether or not to apply the modifiers.
+
+             ii. If the fragment missing belongs to the modifiers and
+                 the text strings were received complete, then the
+                 incomplete modifiers may be used.  The text string
+                 SHOULD at least be displayed as plain text.  As
+                 mentioned in Section 4.4, modifiers may split without
+                 observing meaningful boundaries.  Hence, it may not
+                 always be possible to make use of partially received
+                 modifiers.  However, to avoid this, it is RECOMMENDED
+                 that the modifiers do split at meaningful boundaries.
+
+            iii. A third possibility is that it is not possible to
+                 discern whether modifiers or text strings were received
+                 complete.  For example, if the TYPE 3 unit of a sample
+                 plus the following or preceding packet is lost, there
+                 is no way for the RTP receiver to know if one or both
+                 packets lost belong to the modifiers or if there are
+                 also some missing text strings.  Repetition, FEC,
+                 retransmission, or other protection mechanisms as per
+                 section 4.6 are RECOMMENDED to avoid this situation.
+
+             iv. Finally, if it is sure that neither text strings nor
+                 modifiers were received complete, then the text strings
+                 and the modifiers may be rendered partially or may be
+                 discarded.  This is an application choice.
+
+     3. Sample descriptions can be directly associated with the
+        reassembled text samples, via the sample description index
+        (SIDX).
+
+     4. Reassembling of text strings: Since the text strings transported
+        in RTP packets MUST NOT include any byte order mark (BOM), the
+        receiver MUST prepend it to the reassembled UTF-16 string before
+        handling it to the timed text decoder (see Figure 9).  The value
+        of the BOM is 0xFEFF because only big endian serialization of
+        UTF-16 strings is supported by this payload format.
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 34]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+4.6.  On Aggregate Payloads
+
+   Units SHOULD be aggregated to avoid overhead, whenever possible.  The
+   aggregate payloads MUST comply with one of the following ordered
+   configurations:
+
+   1. Zero or more sample descriptions (TYPE 5) followed by zero or more
+      whole text samples (TYPE 1 units).  At least one unit of either
+      type MUST be present.
+
+   2. Zero or more sample descriptions followed by zero or one modifier
+      fragment, either TYPE 3 or TYPE 4.  At least one unit MUST be
+      present.
+
+   3. Zero or more sample descriptions, followed by zero or one text
+      string fragment (TYPE 2), followed by zero or one TYPE 3 unit.  If
+      a TYPE 2 unit and a TYPE 3 unit are present, then they MUST belong
+      to the same text sample.  At least one unit MUST be present.
+
+   Some observations:
+
+   o Different aggregates than the ones listed above SHALL NOT be used.
+
+   o Sample descriptions MUST be placed in the aggregate payload before
+     the occurrence of any non-TYPE 5 units.
+
+   o Correct reception of TYPE 5 units is important since their contents
+     may be referenced by several other units in the stream.
+
+     Receivers are unable to use text samples until their corresponding
+     sample descriptions are received.  Accordingly, a sender SHOULD
+     send multiple copies of a sample description to ensure reliability
+     (see Section 5).  Receivers MAY use payload-specific feedback
+     messages [21] to tell a sender that they have received a particular
+     sample description.
+
+   o Regarding timestamp calculation: In general, the rules for
+     calculating the timestamp of units in an aggregate payload depend
+     on the type of unit.  Based on the possible constellations for
+     aggregate payloads, as above, we have:
+
+           o Sample descriptions MUST receive the RTP timestamp of the
+             packet in which they are included.
+
+             Note that for TYPE 5 units, the timestamp actually does not
+             represent the instant when they are played out, but instead
+             the instant at which they become available for use.
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 35]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+           o For the first configuration: The first TYPE 1 unit receives
+             the RTP timestamp.  The timestamp of any subsequent TYPE 1
+             unit MUST be obtained by adding sample duration and
+             timestamp, both of the preceding TYPE 1 unit.
+
+           o For the second and third configuration, all units, TYPE 2,
+             3, and 4, MUST receive the RTP timestamp.
+
+           Refer to detailed examples on the timestamp calculation
+           below.
+
+   o As per configuration 3 above, a payload MAY contain several
+     fragments of one (and only one) text sample.  If it does, then
+     exactly one TYPE 2 unit followed by exactly one TYPE 3 unit is
+     allowed in the same payload.  This is in line with RFC 3640 [12],
+     Section 2.4, which explicitly disallows combining fragments of
+     different samples in the same RTP payload.  Note that, in this
+     special case, no timestamp calculation is needed.  That is, the RTP
+     timestamp of both units is equal to the timestamp in the packet's
+     RTP header.
+
+   o Finally, note that the use of empty text samples allows for
+     aggregating non-consecutive TYPE 1 units in the same payload.  Two
+     text samples, with timestamps TS1 and TS3 and durations SDUR1 and
+     SDUR3, are not consecutive if it holds TS1+SDUR1 < TS3.  A solution
+     for this is to include an empty TYPE 1 unit with duration SDUR2
+     between them, such that TS2+SDUR2 = TS1+SDUR1+SDUR2 = TS3.
+
+   Some examples of aggregate payloads are illustrated in Figure 10.
+   (Note: The figure is not scaled.)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 36]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+      N/A    TS1   TS2     TS3
+    +------+-----+------+-----+
+    |TYPE5 |TYPE1|TYPE1 |TYPE1|
+    +------+-----+------+-----+
+      N/A   sdur1  sdur2  sdur3
+
+                                   N/A    TS4
+                                 +-----+-------+
+                                 |TYPE5| TYPE 1|                   a)
+                                 +-----+-------+
+                                   N/A   sdur4
+
+                                        TS4         TS4    TS4
+                                 +--------------+ +--------------+
+                                 |    TYPE2     | |TYPE2 |TYPE 3 | b)
+                                 +--------------+ +--------------+
+                                       sdur4       sdur4   sdur4
+
+                                        TS4             TS4
+                                 +--------------+ +--------------+
+                                 | TYPE2| TYPE 3| |     TYPE4    | c)
+                                 +--------------+ +--------------+
+                                   sdur4  sdur4        sdur4
+
+    |----------PAYLOAD 1------|  |--PAYLOAD 2---| |--PAYLOAD 3---|
+               rtpts1               rtpts2           rtpts3
+
+        KEY:
+        TSx    = Text Sample x
+        rtptsy = the standard RTP timestamp for PAYLOAD y
+        sdurx  = the duration of Text Sample x
+        N/A    =  not applicable
+
+                  Figure 10.  Example aggregate payloads
+
+   In Figure 10, four text samples (TS1 through TS4) are sent using
+   three RTP packets.  These configurations have been chosen to show how
+   the 5 TYPE headers are used.  Additionally, three different
+   possibilities for the last text sample, TS4, are depicted: a), b),
+   and c).
+
+   In Figure 11, option b) from Figure 10 is chosen to illustrate how
+   the timestamp for each unit is found.
+
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 37]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+      N/A    TS1   TS2    TS3        TS4            TS4    TS4
+    +------+-----+------+-----+  +--------------+ +--------------+
+    |TYPE5 |TYPE1|TYPE1 |TYPE1|  |    TYPE2     | |TYPE2 |TYPE 3 |
+    +------+-----+------+-----+  +--------------+ +--------------+
+      N/A   sdur1 sdur2  sdur3         sdur4       sdur4   sdur4
+
+     (#1)    (#2) (#3)   (#4)           (#5)        (#6)    (#7)
+
+    |----------PAYLOAD 1------|  |--PAYLOAD 2---| |--PAYLOAD 3---|
+               rtpts1               rtpts2           rtpts3
+
+               Figure 11.  Selected payloads from Figure 10
+
+   Assuming TSx means Text Sample x, rtptsy represents the standard RTP
+   timestamp for PAYLOAD y and sdurx, the duration of Text Sample x, the
+   timestamp for unit #z, ts(#z), can be found as the sum of rtptsy and
+   the cumulative sum of the durations of preceding units in that
+   payload (except in the case of PAYLOAD 3 as per rule 3 above).  Thus,
+   we have:
+
+          1. for the units in the first aggregate payload, PAYLOAD 1:
+
+                        ts(#1) = rtpts1
+                        ts(#2) = rtpts1
+                        ts(#3) = rtpts1 + sdur1
+                        ts(#4) = rtpts1 + sdur1 + sdur2
+
+           Note that the TYPE 5 and the first TYPE 1 unit have both the
+           RTP timestamp.
+
+          2. for PAYLOAD 2:
+
+                        ts(#5) = rtpts2
+
+          3. for PAYLOAD 3:
+
+                        ts(#6) = ts(#7) = rtpsts2 = rtpts3
+
+           According to configuration 3 above, the TYPE2 and the TYPE 3
+           units shall belong to the same sample.  Hence, rtpts3 must be
+           equal to rtpts2.  For the same reason, the value of SDUR is
+           not be used to calculate the timestamp of the next unit.
+
+
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 38]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+4.7.  Payload Examples
+
+   Some examples of payloads using the defined headers are shown below:
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |V=2|P|X| CC    |M|    PT       |        sequence number        |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                           timestamp                           |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |           synchronization source (SSRC) identifier            |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |U|   R   |TYPE1|       LEN  (always >=8)       |    SIDX       |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                     SDUR                      |     TLEN      |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |    TLEN       |                                               |
+      +---------------+                                               |
+      |                  text string (no.bytes=TLEN)                  |
+      |                                                               |
+      |                                                               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                   modifiers   (no.bytes=LEN - 8 - TLEN)       |
+      |                                                               |
+      |                                                               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |U|   R   |TYPE1|       LEN  (always >=8)       |    SIDX       |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                     SDUR                      |     TLEN      |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |    TLEN       |                                               |
+      +---------------+                                               |
+      |                  text string (no.bytes=TLEN)                  |
+      |                                                               |
+      |                                                               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                   modifiers   (no.bytes=LEN - 8 - TLEN)       |
+      |                                               +-+-+-+-+-+-+-+-+
+      |                                               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+            Figure 12.  A payload carrying two TYPE 1 units
+
+   In Figure 12, an RTP packet carrying two TYPE 1 units is depicted.
+   It can be seen how the length fields LEN and TLEN can be used to find
+   the start of the next unit (LEN), the start of the modifiers (TLEN),
+   and the length of the modifiers (LEN-TLEN).
+
+
+
+Rey & Matsui                Standards Track                    [Page 39]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |V=2|P|X| CC    |M|    PT       |        sequence number        |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                           timestamp                           |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |           synchronization source (SSRC) identifier            |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |U|   R   |TYPE5|      LEN( always >3)          |   SIDX        |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                                                               |
+      |                   sample description (no.bytes=LEN - 3)       |
+      |                                                               |
+      |                                                               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |U|   R   |TYPE1|       LEN  (always >=8)       |    SIDX       |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                      SDUR                     |     TLEN      |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |      TLEN     |                                               |
+      +-+-+-+-+-+-+-+-+                                               |
+      |                  text string fragment (no.bytes=TLEN)         |
+      |                                                               |
+      |                                                               |
+      |                                               +-+-+-+-+-+-+-+-+
+      |                                               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+     Figure 13.  An RTP packet carrying a TYPE 5 and a TYPE 1 unit
+
+   In Figure 13, a sample description and a TYPE 1 unit are aggregated.
+   The TYPE 1 unit happens to contain only text strings and is small, so
+   an additional TYPE 5 unit is included to take advantage of the
+   available bits in the packet.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 40]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |V=2|P|X| CC    |M|    PT       |        sequence number        |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                           timestamp                           |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |           synchronization source (SSRC) identifier            |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |U|   R   |TYPE2|          LEN( always >9)      |TOTAL=4|THIS=1 |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                    SDUR                       |    SIDX       |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |               SLEN            |                               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
+      |                  text string fragment (no.bytes=LEN - 9)      |
+      |                                                               |
+      :                                                               :
+      :                                                               :
+      |                                               +-+-+-+-+-+-+-+-+
+      |                                               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+    Figure 14.  Payload with first text string fragment of a sample
+
+   In Figures 14, 15, and 16, a text sample is split into three RTP
+   packets.  In Figure 14, the text string is big and takes the whole
+   packet length.  In Figure 15, the only possibility for carrying two
+   fragments of the same text sample is represented (see configuration 3
+   in Section 4.6).  The last packet, shown in Figure 16, carries the
+   last modifier fragment, a TYPE 4.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 41]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |V=2|P|X| CC    |M|    PT       |        sequence number        |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                           timestamp                           |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |           synchronization source (SSRC) identifier            |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |U|   R   |TYPE2|          LEN( always >9)      |TOTAL=4|THIS=2 |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                    SDUR                       |    SIDX       |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |               SLEN            |                               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
+      |                  text string fragment (no.bytes=LEN - 9)      |
+      |                                                               |
+      |                                                               |
+      |                                                               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |U|   R   |TYPE3|        LEN( always >6)        |TOTAL=4|THIS=3 |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                      SDUR                     |               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
+      |                                                               |
+      |                    modifiers (no.bytes=LEN - 6)               |
+      |                                               +-+-+-+-+-+-+-+-+
+      |                                               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+      Figure 15.  An RTP packet carrying a TYPE 2 unit and a TYPE 3 unit
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 42]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |V=2|P|X| CC    |M|    PT       |        sequence number        |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                           timestamp                           |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |           synchronization source (SSRC) identifier            |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |U|   R   |TYPE4|        LEN( always >6)        |TOTAL=4|THIS=4 |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                      SDUR                     |               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
+      |                                                               |
+      |                    modifiers (no.bytes=LEN - 6)               |
+      |                                               +-+-+-+-+-+-+-+-+
+      |                                               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+     Figure 16.  An RTP packet carrying last modifiers fragment (TYPE 4)
+
+4.8.  Relation to RFC 3640
+
+   RFC 3640 [12] defines a payload format for the transport of any non-
+   multiplexed MPEG-4 elementary stream.  One of the various MPEG-4
+   elementary stream types is MPEG-4 timed text streams, specified in
+   MPEG-4 part 17 [26], also known as ISO/IEC 14496-17.  MPEG-4 timed
+   text streams are capable of carrying 3GPP timed text data, as
+   specified in 3GPP TS 26.245 [1].
+
+   MPEG-4 timed text streams are intentionally constructed so as to
+   guarantee interoperability between RFC 3640 and this payload format.
+   This means that the construction of the RTP packets carrying timed
+   text is the same.  That is, the MPEG-4 timed text elementary stream
+   as per ISO/IEC 14496-17 is identical to the (aggregate) payloads
+   constructed using this payload format.
+
+   Figure 17 illustrates the process of constructing an RTP packet
+   containing timed text.  As can be seen in the partition block, the
+   (transport) units used in this payload format are identical to the
+   Timed Text Units (TTUs) defined in ISO/IEC 14496-17.  Likewise, the
+   rules for payload aggregation as per Section 4.6 are identical to
+   those defined in ISO/IEC 14496-17 and are compliant with RFC 3640.
+   As a result, an RTP packet that uses this payload format is identical
+   to an RTP packet using RFC 3640 conveying TTUs according to ISO/IEC
+   14496-17.  In particular, MPEG-4 Part 17 specifies that when using
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 43]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+   RFC 3640 for transporting timed text streams, the "streamType"
+   parameter value is set to 0x0D, and the value of the
+   "objectTypeIndication" in "config" takes the value 0x08.
+
+                +--------------------------------------+
+   Text samples | +--------------+   +--------------+  |
+   as per 3GPP  | |Text Sample 1 |   |Text Sample N |  |
+   TS 26245     | +--------------+   +--------------+  |
+                +--------------------------------------+
+                                  \/
+   +-------------------------------------------------------------------+
+   | Partition Text Samples into units.  TTU[i]= TYPE i units.         |
+   |                                                                   |
+   |[U R TYPE LEN][{TOTAL,THIS}SIDX{SDUR}{TLEN}{SLEN}][SampleContents] |
+   |{..} means present if applicable, [..] means always present        |
+   +-------------------------------------------------------------------+
+                   \/                                \/
+   +-------------------------------------------------------------------+
+   |                      Aggregation (if possible)                    |
+   +-------------------------------------------------------------------+
+                   \/                                \/
+   +-------------------------------------------------------------------+
+   | RTP Entity adds and fills RTP header and Sends RTP packet, where  |
+   |  RTP packets according to this Payload Format =                   |
+   |  RTP packets carrying MPEG-4 Timed Text ES over RFC 3640          |
+   +-------------------------------------------------------------------+
+
+                     Figure 17.  Relation to RFC 3640
+
+   Note: The use of RFC 3640 for transport of ISO/IEC 14496-17 data does
+   not require any new SDP parameters or any new mode definition.
+
+4.9.  Relation to RFC 2793
+
+   RFC 2793 [22] and its revision, RFC 4103 [23], specify a protocol for
+   enabling text conversation.  Typical applications of this payload
+   format are text communication terminals and text conferencing tools.
+   Text session contents are specified in ITU-T Recommendation T.140
+   [24].  T.140 text is UTF-8 coded as specified in T.140 [24] with no
+   extra framing.  The T140block contains one or more T.140 code
+   elements as specified in T.140.  Code elements are control sequences
+   such as "New Line", "Interrupt", "String Terminator", or "Start of
+   String".  Most T.140 code elements are single ISO 10646 [25]
+   characters, but some are multiple character sequences.  Each
+   character is UTF-8 encoded [18] into one or more octets.
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 44]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+   This payload format may also be used for conversational applications
+   (even for instant messaging).  However, this is not its main target.
+   The differentiating feature of 3GPP Timed Text media format is that
+   it allows text decoration.  This is especially useful in multimedia
+   presentations, karaoke, commercial banners, news tickers, clickable
+   text strings, and captions.  T.140 text contents used in RFC 2793 do
+   not allow the use of text decoration.
+
+   Furthermore, the conversational text RTP payload format recommends a
+   method to include redundant text from already transmitted packets in
+   order to reduce the risk of text loss caused by packet loss.  Thereby
+   payloads would include a redundant copy of the last payload sent.
+   This payload format does not describe such a method, but this is also
+   applicable here.  As explained in Section 5, packet redundancy SHOULD
+   be used, whenever possible.  The aggregation guidelines in Section
+   4.6 allow redundant payloads.
+
+5.  Resilient Transport
+
+   Apart from the basic fragmentation guidelines described in the
+   section above, the simplest option for packet-loss-resilient
+   transport is packet repetition.  This mechanism may consist of a
+   strict window-based repetition mechanism or, simply, a repetition
+   mechanism in a wider sense, where new and old packets are mixed, for
+   example.
+
+   A server MAY decide to use repetition as a measure for packet loss
+   resilience.  Thereby, a server MAY send the same RTP payloads or just
+   some of the units from the payloads.
+
+   As for the case of complete payloads, single repeated units MUST
+   exactly match the same units sent in the first transmission; i.e., if
+   fragmentation is needed, it SHALL be performed only once for each
+   text sample.  Only then, a receiver can use the already received and
+   the repeated units to reconstruct the original text samples.  Since
+   the RTP timestamp is used to group together the fragments of a
+   sample, care must taken to preserve the timing of units when
+   constructing new RTP packets.
+
+        For example, if a text sample was originally sent as a single
+        non-fragmented text sample (one TYPE 1 unit), a repetition of
+        that sample MUST be sent also as a single non-fragmented text
+        sample in one unit.  Likewise, if the original text sample was
+        fragmented and spread over several RTP packets (say, a total of
+        3 units), then the repeated fragments SHALL also have the same
+        byte boundaries and use the same unit headers and bytes per
+        fragment.
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 45]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+   With repetition, repeated units resolve to the same timestamp as
+   their originals.  Where redundant units are available, only one of
+   them SHALL be used.
+
+   Regarding the RTP header fields:
+
+   o If the whole RTP payload is repeated, all payload-specific fields
+     in the RTP header (the M, TS and PT fields) MUST keep their
+     original values except the sequence number, which MUST be
+     incremented to comply with RTP (the fields TOTAL/THIS enable to
+     re-assemble fragments with different sequence numbers).
+
+   o In packets containing single repeated units, the general rules in
+     Section 3 for assigning values to the RTP header fields apply.
+     Keeping the value of the RTP timestamp to preserve the timing of
+     the units is particularly relevant here.
+
+   Apart from repetition, other mechanisms such as FEC [7],
+   retransmission [11], or similar techniques could be used to cope with
+   packet losses.
+
+6.  Congestion Control
+
+   Congestion control for RTP SHALL be implemented in accordance with
+   RTP [3] and the applicable RTP profile, e.g., RTP/AVP [17].
+
+   When using this payload format, mainly two factors may affect the
+   congestion control:
+
+   o The use of (unit) aggregation may make the payload format more
+     bandwidth efficient, by avoiding header overhead and thus reducing
+     the used bitrate.
+
+   o The use of resilient transport mechanisms: Although timed text
+     applications typically operate at low bitrates, the increase due to
+     resilient transport shall be considered for congestion control
+     mechanisms.  This applies to all mechanisms but especially to less
+     efficient ones like repetition.
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 46]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+7.  Scene Description
+
+7.1.  Text Rendering Position and Composition
+
+   In order to set up a timed text session, regardless of the stream
+   being stored in a 3GP file or streamed live, some initial layout
+   information is needed by the communicating peers.
+
+      +-------------------------------------------+
+      |      <-> tx                               |    +-------------+
+      |     +-------------------------------+     |<---|Display Area |
+      |  ^  |                               |     |    +-------------+
+      |  :  |                               |     |
+      |  :ty|                               |     |    +-------------+
+      |  :  |                               |<---------|Video track  |
+      |  :  |                               |     |    +-------------+
+      |  :  |                               |     |
+      |  :  |                               |     |
+      |  :  |                               |     |
+      |  v  |                               |     |
+      |  -  |   x-------------------------+ |     |    +-------------+
+      |h ^  |   |                         |<-----------|Text Track   |
+      |e :  +---|-------------------------|-+     |    +-------------+
+      |i :      | +---------------------+ |       |
+      |g :      | |                     | |       |    +-------------+
+      |h :      | |                     |<------------ |Text Box     |
+      |t v      | +---------------------+ |       |    +-------------+
+      |  -      +-------------------------+       |
+      +-------------------------------------------+
+                <........................>
+                        w i d t h
+
+   Figure 18.  Illustration of text rendering position and composition
+
+   The parameters used for negotiating the position and size of the text
+   track in the display area are shown in Figure 18.  These are the
+   "width" and "height" of the text track, its translation values, "tx"
+   and "ty", and its "layer" or proximity to the user.
+
+   At the same time, the sender of the stream needs to know the
+   receiver's capabilities.  In this case, the maximum allowable values
+   for the text track height and width: "max-h" and "max-w", for the
+   stream the receiver shall display.
+
+   This layout information MUST be conveyed in a reliable form before
+   the start of the session, e.g., during session announcement or in an
+   Offer/Answer (O/A) exchange.  An example of a reliable transport may
+   be the out-of-band channel used for SDP.  Sections 8 and 9 provide
+
+
+
+Rey & Matsui                Standards Track                    [Page 47]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+   details on the mapping of these parameters to SDP descriptions and
+   their usage in O/A.
+
+   For stored content, the layout values expressing stream properties
+   MUST be obtained from the Track Header Box.  See Section 7.3.
+
+   For live streaming, appropriate values as negotiated during session
+   setup shall be used.
+
+7.2.  SMIL Usage
+
+   The attributes contained in the Track Header Boxes of a 3GP file only
+   specify the spatial relationship of the tracks within the given 3GP
+   file.
+
+   If multiple 3GP files are sent, they require spatial synchronization.
+   For example, for a text and video stream, the positions of the text
+   and video tracks in Figure 18 shall be determined.  For this purpose,
+   SMIL [9] MAY be used.
+
+   SMIL assigns regions in the display to each of those files and places
+   the tracks within those regions.  Generally, in SMIL, the position of
+   one track (or stream) is expressed relative to another track.  This
+   is different from the 3GP file, where the upper left corner is the
+   reference for all translation offsets.  Hence, only if the position
+   in SMIL is relative to the video track origin, then this translation
+   offset has the same value as (tx, ty) in the 3GP file.
+
+   Note also that the original track header information is used for each
+   track only within its region, as assigned by SMIL.  Therefore, even
+   if SMIL scene description is used, the track header information
+   pieces SHOULD be sent anyway, as they represent the intrinsic media
+   properties.  See 3GPP SMIL Language Profile in [27] for details.
+
+7.3.  Finding Layout Values in a 3GP File
+
+   In a 3GP file, within the Track Header Box (tkhd):
+
+        o tx, ty: These values specify the translation offset of the
+          (text) track relative to the upper left corner of the video
+          track, if present.  They are the second but last and third but
+          last values in the unity matrix; values are fixed-point 16.16
+          values, restricted to be (signed) integers (i.e., the lower 16
+          bits of each value shall be all zeros).  Therefore, only the
+          first 16 bits are used for obtaining the value of the media
+          type parameters.
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 48]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+        o width, height: They have the same name in the tkhd box.  All
+          (unsigned) 32 bits are meaningful.
+
+        o layer: All (signed) 16 bits are used.
+
+8.  3GPP Timed Text Media Type
+
+   The media subtype for the 3GPP Timed Text codec is allocated from the
+   standards tree.  The top-level media type under which this payload
+   format is registered is 'video'.  This registration is done using the
+   template defined in [29] and following RFC 3555 [28].
+
+   The receiver MUST ignore any unrecognized parameter.
+
+   Media type: video
+
+   Media subtype: 3gpp-tt
+
+   Required parameters
+
+        rate:
+                Refer to Section 3 in RFC 4396.
+
+        sver:
+                The parameter "sver" contains a list of supported
+                backwards-compatible versions of the timed text format
+                specification (3GPP TS 26.245) that the sender accepts
+                to receive (and that are the same that it would be
+                willing to send).  The first value is the value
+                preferred to receive (or preferred to send).  The first
+                value MAY be followed by a comma-separated list of
+                versions that SHOULD be used as alternatives.  The order
+                is meaningful, being first the most preferred and last
+                the least preferred.  Each entry has the format
+                Zi(xi*256+yi), where "Zi" is the number of the Release
+                and "xi" and "yi" are taken from the 3GPP specification
+                version (i.e., vZi.xi.yi).  For example, for 3GPP TS
+                26.245 v6.0.0, Zi(xi*256+yi)=6(0), the version value is
+                "60".  (Note that "60" is the concatenation of the
+                values Zi=6 and (xi*256+yi)=0 and not their product.)
+
+                If no "sver" value is available, for example, when
+                streaming out of a 3GP file, the default value "60",
+                corresponding to the 3GPP Release 6 version of 3GPP TS
+                26.245, SHALL be used.
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 49]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+   Optional parameters:
+
+        tx:
+                This parameter indicates the horizontal translation
+                offset in pixels of the text track with respect to the
+                origin of the video track.  This value is the decimal
+                representation of a 16-bit signed integer.  Refer to TS
+                3GPP 26.245 for an illustration of this parameter.
+
+        ty:
+                This parameter indicates the vertical translation offset
+                in pixels of the text track with respect to the origin
+                of the video track.  This value is the decimal
+                representation of a 16-bit signed integer.  Refer to TS
+                3GPP 26.245 for an illustration of this parameter.
+
+        layer:
+                This parameter indicates the proximity of the text track
+                to the viewer.  More negative values mean closer to the
+                viewer.  This parameter has no units.  This value is the
+                decimal representation of a 16-bit signed integer.
+
+        tx3g:
+                This parameter MUST be used for conveying sample
+                descriptions out-of-band.  It contains a comma-separated
+                list of base64-encoded entries.  The entries of this
+                list MAY follow any particular order and the list SHALL
+                NOT be empty.  Each entry is the result of running
+                base64 encoding over the concatenation of the (static)
+                SIDX value as an 8-bit unsigned integer and the (static)
+                sample description for that SIDX, in that order.  The
+                format of a sample description entry can be found in
+                3GPP TS 26.245 Release 6 and later releases.  All
+                servers and clients MUST understand this parameter and
+                MUST be capable of using the sample description(s)
+                contained in it.  Please refer to RFC 3548 [6] for
+                details on the base64 encoding.
+
+        width:
+                This parameter indicates the width in pixels of the text
+                track or area of the text being sent.  This value is the
+                decimal representation of a 32-bit unsigned integer.
+                Refer to TS 3GPP 26.245 for an illustration of this
+                parameter.
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 50]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+        height:
+                This parameter indicates the height in pixels of the
+                text track being sent.  This value is the decimal
+                representation of a 32-bit unsigned integer.  Refer to
+                TS 3GPP 26.245 for an illustration of this parameter.
+
+        max-w:
+                This parameter indicates display capabilities.  This is
+                the maximum "width" value that the sender of this
+                parameter supports.  This value is the decimal
+                representation of a 32-bit unsigned integer.
+
+        max-h:
+                This parameter indicates display capabilities.  This is
+                the maximum "height" value that the sender of this
+                parameter supports.  This value is the decimal
+                representation of a 32-bit unsigned integer.
+
+   Encoding considerations:
+
+        This media type is framed (see Section 4.8 in [29]) and
+        partially contains binary data.
+
+   Restrictions on usage:
+
+        This media type depends on RTP framing, and hence is only
+        defined for transfer via RTP [3].  Transport within other
+        framing protocols is not defined at this time.
+
+   Security considerations:
+
+        Please refer to Section 11 of RFC 4396.
+
+   Interoperability considerations:
+
+        The 3GPP Timed Text media format and its file storage is
+        specified in Release 6 of 3GPP TS 26.245, "Transparent end-to-
+        end packet switched streaming service (PSS); Timed Text Format
+        (Release 6)".  Note also that 3GPP may in future releases
+        specify extensions or updates to the timed text media format in
+        a backwards-compatible way, e.g., new modifier boxes or
+        extensions to the sample descriptions.  The payload format
+        defined in RFC 4396 allows for such extensions.  For future 3GPP
+        Releases of the Timed Text Format, the parameter "sver" is used
+        to identify the exact specification used.
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 51]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+        The defined storage format for 3GPP Timed Text format is the
+        3GPP File Format (3GP) [30]. 3GP files may be transferred using
+        the media type video/3gpp as registered by RFC 3839 [31].  The
+        3GPP File Format is a container file that may contain, e.g.,
+        audio and video that may be synchronized with the 3GPP Timed
+        Text.
+
+   Published specification: RFC 4396
+
+   Applications which use this media type:
+
+        Multimedia streaming applications.
+
+   Additional information:
+
+        The 3GPP Timed Text media format is specified in 3GPP TS 26.245,
+        "Transparent end-to-end packet switched streaming service (PSS);
+        Timed Text Format (Release 6)".  This document and future
+        extensions to the 3GPP Timed Text format are publicly available
+        at http://www.3gpp.org.
+
+        Magic number(s): None.
+
+        File extension(s): None.
+
+        Macintosh File Type Code(s): None.
+
+   Person & email address to contact for further information:
+
+        Jose Rey, jose.rey@eu.panasonic.com
+        Yoshinori Matsui, matsui.yoshinori@jp.panasonic.com
+        Audio/Video Transport Working Group.
+
+   Intended usage: COMMON
+
+   Authors:
+        Jose Rey
+        Yoshinori Matsui
+
+   Change controller: IETF Audio/Video Transport Working Group delegated
+        from the IESG.
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 52]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+9.  SDP Usage
+
+9.1.  Mapping to SDP
+
+   The information carried in the media type specification has a
+   specific mapping to fields in SDP [4].  If SDP is used to specify
+   sessions using this payload format, the mapping is done as follows:
+
+   o The media type ("video") goes in the SDP "m=" as the media name.
+
+       m=video <port number> RTP/<RTP profile> <dynamic payload type>
+
+   o The media subtype ("3gpp-tt") and the timestamp clockrate "rate"
+     (the RECOMMENDED 1000 Hz or other value) go in SDP "a=rtpmap" line
+     as the encoding name and rate, respectively:
+
+       a=rtpmap:<payload type> 3gpp-tt/1000
+
+   o The REQUIRED parameter "sver" goes in the SDP "a=fmtp" attribute by
+     copying it directly from the media type string as a semicolon-
+     separated parameter=value pair.
+
+   o The OPTIONAL parameters "tx", "ty", "layer", "tx3g", "width",
+     "height", "max-w" and "max-h" go in the SDP "a=fmtp" attribute by
+     copying them directly from the media type string as a semicolon
+     separated list of parameter=value(s) pairs:
+
+       a=fmtp:<dynamic payload type> <parameter
+       name>=<value>[,<value>][; <parameter name>=<value>]
+
+   o   Any parameter unknown to the device that uses the SDP SHALL be
+       ignored.  For example, parameters added to the media format in
+       later specifications MAY be copied into the SDP and SHALL be
+       ignored by receivers that do not understand them.
+
+9.2.  Parameter Usage in the SDP Offer/Answer Model
+
+   In this section, the meaning of the SDP parameters defined in this
+   document within the Offer/Answer [13] context is explained.
+
+   In unicast, sender and receiver typically negotiate the streams,
+   i.e., which codecs and parameter values are used in the session.
+   This is also possible in multicast to a lesser extent.
+
+   Additionally, the meaning of the parameters MAY vary depending on
+   which direction is used.  In the following sections, a
+   "<directionality> offer" means an offer that contains a stream set to
+   <directionality>.  <directionality> may take the values sendrecv,
+
+
+
+Rey & Matsui                Standards Track                    [Page 53]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+   sendonly, and recvonly.  Similar considerations apply for answers.
+   For example, an answer to a sendonly offer is a recvonly answer.
+
+9.2.1. Unicast Usage
+
+   The following types of parameters are used in this payload format:
+
+     1. Declarative parameters: Offerer and answerer declare the values
+        they will use for the incoming (sendrecv/recvonly) or outgoing
+        (sendonly) stream.  Offerer and answerer MAY use different
+        values.
+
+          a. "tx", "ty", and "layer": These are parameters describing
+             where the received text track is placed.  Depending on the
+             directionality:
+
+              i. They MUST appear in all sendrecv offers and answers and
+                 in all recvonly offers and answers (thus applying to
+                 the incoming stream).  In the case of sendrecv offers
+                 and answers and in recvonly offers, these values SHOULD
+                 be used by the sender of the stream unless it has a
+                 particular preference, in which case, it MUST make sure
+                 that these different values do not corrupt the
+                 presentation.  For recvonly answers, the answerer MAY
+                 accept the proposed values for the incoming stream (in
+                 a sendonly offer; see ii. below) or respond with
+                 different ones.  The offerer MUST use the returned
+                 values.
+
+             ii. They MAY appear in sendonly offers and MUST appear in
+                 sendonly answers.  In sendonly offers, they specify the
+                 values that the offerer proposes for sending (see
+                 example in Section 9.3).  In sendonly answers, these
+                 values SHOULD be copied from the corresponding recvonly
+                 offer upon accepting the stream, unless a particular
+                 preference by the receiver of the stream exists, as
+                 explained in the previous point.
+
+     2. Parameters describing the display capabilities, "max-h" and
+        "max-w", which indicate the maximum dimensions of the text track
+        (text display area) for the incoming stream "tx" and "ty" values
+        (see Figure 18).  "max-h" and "max-w" MUST be included in all
+        offers and answers where "tx" and "ty" refer to the incoming
+        stream, thus excluding sendonly offers and answers (see example
+        in Section 9.3), where they SHALL NOT be present.
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 54]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+     3. Parameters describing the sent stream properties, i.e., the
+        sender of the stream decides upon the values of these:
+
+          a. "width" and "height" specify the text track dimensions.
+             They SHALL ALWAYS be present in sendrecv and sendonly
+             offers and answers.  For recvonly answers, the answerer
+             MUST include the offered parameter values (if any) verbatim
+             in the answer upon accepting the stream.
+
+          b. "tx3g" contains static sample descriptions.  It MAY only be
+             present in sendrecv and sendonly offers and answers.  This
+             parameter applies to the stream that offerers or answerers
+             send.
+
+     4. Negotiable parameters, which MUST be agreed on.  This is the
+        case of "sver".  This parameter MUST be present in every offer
+        and answer.  The answerer SHALL choose one supported value from
+        the offerer's list, or else it MUST remove the stream or reject
+        the session.
+
+     5. Symmetric parameters: "rate", timestamp clockrate, belongs to
+        this class.  Symmetric parameters MUST be echoed verbatim in the
+        answer.  Otherwise, the stream MUST be removed or the session
+        rejected.
+
+   The following table summarizes all options:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 55]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+     +..---------------------------+----------+----------+----------+
+     |   ``--..__  Directionality/ | sendrecv | recvonly | sendonly |
+     + Type of   ``--..__   O or A +----------+----------+----------+
+     |    Parameter      ``--..__  |   O/A    |   O/A    |   O/A    |
+     +--------------+------------``+----------+----------+----------+
+     | Declarative  |tx, ty, layer |   M/M    |   M/M    |   m/M    |
+     |              |              |          |          |          |
+     +--------------+--------------+----------+----------+----------+
+     | Display      |max-h, max-w  |   M/M    |   M/M    |   -/-    |
+     | Capabilities |              |          |          |          |
+     +--------------+--------------+----------+----------+----------+
+     | Stream       |height, width |   M/M    |   -/(M)  |   M/M    |
+     | properties   |tx3g          |   m/m    |   -/-    |   m/m    |
+     |              |              |          |          |          |
+     +--------------+--------------+----------+----------+----------+
+     |  Negotiable  |sver          |   M/M    |   M/M    |   M/M    |
+     |              |              |          |          |          |
+     +--------------+--------------+----------+----------+----------+
+     |  Symmetric   |rate          |   M/M    |   M/M    |   M/M    |
+     +--------------+--------------+----------+----------+----------+
+
+          Table 1.  Parameter usage in Unicast Offer / Answer.
+
+   KEY:
+        o M means MUST be present.
+        o m means MAY be present (such as proposed values).
+        o (M) or (m) means MUST or MAY, if applicable.
+        o a hyphen ("-") means the parameter MUST NOT be present.
+
+   Other observations regarding parameter usage:
+
+     o Translation and transparency values: In sendonly offers, "tx",
+       "ty", and "layer" indicate proposed values.  This is useful for
+       visually composed sessions where the different streams occupy
+       different parts of the display, e.g., a video stream and the
+       captions.  These are just suggested values; the peer rendering
+       the text ultimately decides where to place the text track.
+
+     o Text track (area) dimensions, "height" and "width": In the case
+       of sendonly offers, an answerer accepting the offer MUST be
+       prepared to render the stream using these values.  If any of
+       these conditions are not met, the stream MUST be removed or the
+       session rejected.
+
+     o Display capabilities, "max-h" and "max-w": An answerer sending a
+       stream SHALL ensure that the "height" and "width" values in the
+       answer are compatible with the offerer's signaled capabilities.
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 56]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+     o Version handling via "sver": The idea is that offerer and
+       answerer communicate using the same version.  This is achieved by
+       letting the answerer choose from a list of supported versions,
+       "sver".  For recvonly streams, the first value in the list is the
+       preferred version to receive.  Consequently, for sendonly (and
+       sendrecv) streams, the first value is the one preferred for
+       sending (and receiving).  The answerer MUST choose one value and
+       return it in the answer.  Upon receiving the answer, the offerer
+       SHALL be prepared to send (sendonly and sendrecv) and receive
+       (recvonly and sendrecv) a stream using that version.  If none of
+       the versions in the list is supported, the stream MUST be removed
+       or the session rejected.  Note that, if alternative non-
+       compatible versions are offered, then this SHALL be done using
+       different payload types.
+
+9.2.2.  Multicast Usage
+
+   In multicast, the parameter usage is similar to the unicast case,
+   except as follows:
+
+   o the parameters "tx", "ty", and "layer" in multicast offers only
+     have meaning for sendrecv and recvonly streams.  In order for all
+     clients to have the same vision of the session, they MUST be used
+     symmetrically.
+
+   o for "height", "width", and "tx3g" (for sendrecv and sendonly),
+     multicast offers specify which values of these parameters the
+     participants MUST use for sending.  Thus, if the stream is
+     accepted, the answerer MUST also include them verbatim in the
+     answer (also "tx3g", if present).
+
+   o The capability parameters, "max-h" and "max-w", SHALL NOT be used
+     in multicast.  If the offered text track should change in size, a
+     new offer SHALL be used instead.
+
+   o Regarding version handling:
+
+     In the case of multicast offers, an answerer MAY accept a multicast
+     offer as long as one of the versions listed in the "sver" is
+     supported.  Therefore, if the stream is accepted, the answerer MUST
+     choose its preferred version, but, unlike in unicast, the offerer
+     SHALL NOT change the offered stream to this chosen version because
+     there may be other session participants that do support the newer
+     extensions.  Consequently, different session participants may end
+     up using different backwards-compatible media format versions.  It
+     is RECOMMENDED that the multicast offer contains a limited number
+     of versions, in order for all participants to have the same view of
+     the session.  This is a responsibility of the session creator.  If
+
+
+
+Rey & Matsui                Standards Track                    [Page 57]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+     none of the offered versions is supported, the stream SHALL be
+     removed or the session rejected.  Also in this case, if alternative
+     non-compatible versions are offered, then this SHALL be done using
+     different payload types.
+
+9.3.  Offer/Answer Examples
+
+   In these unicast O/A examples, the long lines are wrapped around.
+   Static sample descriptions are shortened for clarity.
+
+   For sendrecv:
+
+   O -> A
+
+   m=video <port> RTP/AVP 98
+   a=rtpmap:98 3gpp-tt/1000
+   a=fmtp:98 tx=100; ty=100; layer=0; height=80; width=100; max-h=120;
+   max-w=160; sver=6256,60; tx3g=81...
+   a=sendrecv
+
+   A -> O
+
+   m=video <port> RTP/AVP 98..
+   a=rtpmap:98 3gpp-tt/1000
+   a=fmtp:98 tx=100; ty=95; layer=0; height=90; width=100; max-h=100;
+   max-w=160; sver=60; tx3g=82...
+   a=sendrecv
+
+   In this example, the offerer is telling the answerer where it will
+   place the received stream and what is the maximum height and width
+   allowable for the stream that it will receive.  Also, it tells the
+   answerer the dimensions of the text track for the stream sent and
+   which sample description it shall use.  It offers two versions, 6256
+   and 60.  The answerer responds with an equivalent set of parameters
+   for the stream it receives.  In this case, the answerer's "max-h" and
+   "max-w" are compatible with the offerer's "height" and "width".
+   Otherwise, the answerer would have to remove this stream, and the
+   offerer would have to issue a new offer taking the answerer's
+   capabilities into account.  This is possible only if multiple payload
+   types are present in the initial offer so that at least one of them
+   matches the answerer's capabilities as expressed by "max-h" and
+   "max-w" in the negative answer.  Note also that the answerer's text
+   box dimensions fit within the maximum values signaled in the offer.
+   Finally, the answerer chooses to use version 60 of the timed text
+   format.
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 58]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+   For recvonly:
+
+   Offerer -> Answerer
+
+   m=video <port> RTP/AVP 98
+   a=rtpmap:98 3gpp-tt/1000
+   a=fmtp:98 tx=100; ty=100; layer=0; max-h=120; max-w=160; sver=6256,60
+   a=recvonly
+
+   A -> O
+
+   m=video <port> RTP/AVP 98..
+   a=rtpmap:98 3gpp-tt/1000
+   a=fmtp:98 tx=100; ty=100; layer=0; height=90; width=100; sver=60;
+   tx3g=82...
+   a=sendonly
+
+   In this case, the offer is different from the previous case: It does
+   not include the stream properties "height", "width", and "tx3g".  The
+   answerer copies the "tx", "ty", and "layer" values, thus
+   acknowledging these.  "max-h" and "max-w" are not present in the
+   answer because the "tx" and "ty" (and "layer") in this special case
+   do not apply to the received stream, but to the sent stream.  Also,
+   if offerer and answerer had very different display sizes, it would
+   not be possible to express the answerer's capabilities.  In the
+   example above and for an answerer with a 50x50 display, the
+   translation values are already out of range.
+
+   For sendonly:
+
+   O -> A
+
+   m=video <port> RTP/AVP 98
+   a=rtpmap:98 3gpp-tt/1000
+   a=fmtp:98 tx=100; ty=100; layer=0; height=80; width=100;
+   sver=6256,60; tx3g=81...
+   a=sendonly
+
+   A -> O
+
+   m=video <port> RTP/AVP 98..
+   a=rtpmap:98 3gpp-tt/1000
+   a=fmtp:98 tx=100; ty=100; layer=0; height=80; width=100; max-h=100;
+   max-w=160; sver=60
+   a=recvonly
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 59]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+   Note that "max-h" and "max-w" are not present in the offer.  Also,
+   with this answer, the answerer would accept the offer as is (thus
+   echoing "tx", "ty", "height", "width", and "layer") and additionally
+   inform the offerer about its capabilities: "max-h" and "max-w".
+
+   Another possible answer for this case would be:
+
+   A -> O
+
+   m=video <port> RTP/AVP 98..
+   a=rtpmap:98 3gpp-tt/1000
+   a=fmtp:98 tx=120; ty=105; layer=0; max-h=95; max-w=150; sver=60
+   a=recvonly
+
+   In this case, the answerer does not accept the values offered.  The
+   offerer MUST use these values or else remove the stream.
+
+9.4.  Parameter Usage outside of Offer/Answer
+
+   SDP may also be employed outside of the Offer/Answer context, for
+   instance for multimedia sessions that are announced through the
+   Session Announcement Protocol (SAP) [14] or streamed through the Real
+   Time Streaming Protocol (RTSP) [15].
+
+   In this case, the receiver of a session description is required to
+   support the parameters and given values for the streams, or else it
+   MUST reject the session.  It is the responsibility of the sender (or
+   creator) of the session descriptions to define the session parameters
+   so that the probability of unsuccessful session setup is minimized.
+   This is out of the scope of this document.
+
+10.  IANA Considerations
+
+   IANA has registered the media subtype name "3gpp-tt" for the media
+   type "video" as specified in Section 8 of this document.
+
+11.  Security Considerations
+
+   RTP packets using the payload format defined in this specification
+   are subject to the security considerations discussed in the RTP
+   specification [3] and any applicable RTP profile, e.g., AVP [17].
+
+   In particular, an attacker may invalidate the current set of active
+   sample descriptions at the client by means of repeating a packet with
+   an old sample description, i.e., replay attack.  This would mean that
+   the display of the text would be corrupted, if displayed at all.
+   Another form of attack may consist of sending redundant fragments,
+   whose boundaries do not match the exact boundaries of the originals
+
+
+
+Rey & Matsui                Standards Track                    [Page 60]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+   (as indicated by LEN) or fragments that carry different sample
+   lengths (SLEN).  This may cause a decoder to crash.
+
+   These types of attack may easily be avoided by using source
+   authentication and integrity protection.
+
+   Additionally, peers in a timed text session may desire to retain
+   privacy in their communication, i.e., confidentiality.
+
+   This payload format does not provide any mechanisms for achieving
+   these.  Confidentiality, integrity protection, and authentication
+   have to be solved by a mechanism external to this payload format,
+   e.g., SRTP [10].
+
+12.  References
+
+12.1.  Normative References
+
+   [1]  Transparent end-to-end packet switched streaming service (PSS);
+        Timed Text Format (Release 6), TS 26.245 v 6.0.0, June 2004.
+
+   [2]  ISO/IEC 14496-12:2004 Information technology - Coding of audio-
+        visual objects - Part 12: ISO base media file format.
+
+   [3]  Schulzrinne, H.,  Casner, S., Frederick, R., and V. Jacobson,
+        "RTP: A Transport Protocol for Real-Time Applications", STD 64,
+        RFC 3550, July 2003.
+
+   [4]  Handley, M. and V. Jacobson, "SDP: Session Description
+        Protocol", RFC 2327, April 1998.
+
+   [5]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
+        Levels", BCP 14, RFC 2119, March 1997.
+
+   [6]  Josefsson, S., "The Base16, Base32, and Base64 Data Encodings",
+        RFC 3548, July 2003.
+
+12.2.  Informative References
+
+   [7]  Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for
+        Generic Forward Error Correction", RFC 2733, December 1999.
+
+   [8]  Perkins, C. and O. Hodson, "Options for Repair of Streaming
+        Media", RFC 2354, June 1998.
+
+   [9]  W3C, "Synchronised Multimedia Integration Language (SMIL 2.0)",
+        August, 2001.
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 61]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+   [10] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
+        Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC
+        3711, March 2004.
+
+   [11] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. Hakenberg,
+        "RTP Retransmission Payload Format", Work in Progress, September
+        2005.
+
+   [12] van der Meer, J., Mackie, D., Swaminathan, V., Singer, D., and
+        P. Gentric, "RTP Payload Format for Transport of MPEG-4
+        Elementary Streams", RFC 3640, November 2003.
+
+   [13] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
+        Session Description Protocol (SDP)", RFC 3264, June 2002.
+
+   [14] Handley, M., Perkins, C., and E. Whelan, "Session Announcement
+        Protocol", RFC 2974, October 2000.
+
+   [15] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming
+        Protocol (RTSP)", RFC 2326, April 1998.
+
+   [16] Transparent end-to-end packet switched streaming service (PSS);
+        Protocols and codecs (Release 6), TS 26.234 v 6.1.0, September
+        2004.
+
+   [17] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video
+        Conferences with Minimal Control", STD 65, RFC 3551, July 2003.
+
+   [18] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD
+        63, RFC 3629, November 2003.
+
+   [19] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 10646",
+        RFC 2781, February 2000.
+
+   [20] Friedman, T., Caceres, R., and A. Clark, "RTP Control Protocol
+        Extended Reports (RTCP XR)", RFC 3611, November 2003.
+
+   [21] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
+        "Extended RTP Profile for RTCP-based Feedback (RTP/AVPF)", Work
+        in Progress, August 2004.
+
+   [22] Hellstrom, G., "RTP Payload for Text Conversation", RFC 2793,
+        May 2000.
+
+   [23] Hellstrom, G. and P. Jones, "RTP Payload for Text Conversation",
+        RFC 4103, June 2005.
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 62]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+   [24] ITU-T Recommendation T.140 (1998) - Text conversation protocol
+        for multimedia application, with amendment 1, (2000).
+
+   [25] ISO/IEC 10646-1: (1993), Universal Multiple Octet Coded
+        Character Set.
+
+   [26] ISO/IEC FCD 14496-17 Information technology - Coding of audio-
+        visual objects - Part 17: Streaming text format, Work in
+        progress, June 2004.
+
+   [27] Transparent end-to-end Packet-switched Streaming Service (PSS);
+        3GPP SMIL language profile, (Release 6), TS 26.246 v 6.0.0, June
+        2004.
+
+   [28] Casner, S. and P. Hoschka, "MIME Type Registration of RTP
+        Payload Formats", RFC 3555, July 2003.
+
+   [29] Freed, N. and J. Klensin, "Media Type Specifications and
+        Registration Procedures", BCP 13, RFC 4288, December 2005.
+
+   [30] Transparent end-to-end packet switched streaming service (PSS);
+        3GPP file format (3GP) (Release 6), TS 26.244 V6.3. March 2005.
+
+   [31] Castagno, R. and D. Singer, "MIME Type Registrations for 3rd
+        Generation Partnership Project (3GPP) Multimedia files", RFC
+        3839, July 2004.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 63]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+13.  Basics of the 3GP File Structure
+
+   This section provides a coarse overview of the 3GP file structure,
+   which follows the ISO Base Media file Format [2].
+
+   Each 3GP file consists of "Boxes".  In general, a 3GP file contains
+   the File Type Box (ftyp), the Movie Box (moov), and the Media Data
+   Box (mdat).  The File Type Box identifies the type and properties of
+   the 3GP file itself.  The Movie Box and the Media Data Box, serving
+   as containers, include their own boxes for each media.  Boxes start
+   with a header, which indicates both size and type (these fields are
+   called, namely, "size" and "type").  Additionally, each box type may
+   include a number of boxes.
+
+   In the following, only those boxes are mentioned that are useful for
+   the purposes of this payload format.
+
+   The Movie Box (moov) contains one or more Track Boxes (trak), which
+   include information about each track.  A Track Box contains, among
+   others, the Track Header Box (tkhd), the Media Header Box (mdhd), and
+   the Media Information Box (minf).
+
+   The Track Header Box specifies the characteristics of a single track,
+   where a track is, in this case, the streamed text during a session.
+   Exactly one Track Header Box is present for a track.  It contains
+   information about the track, such as the spatial layout (width and
+   height), the video transformation matrix, and the layer number.
+   Since these pieces of information are essential and static (i.e.,
+   constant) for the duration of the session, they must be sent prior to
+   the transmission of any text samples.
+
+   The Media Header Box contains the "timescale" or number of time units
+   that pass in one second, i.e., cycles per second or Hertz.  The Media
+   Information Box includes the Sample Table Box (stbl), which contains
+   all the time and data indexing of the media samples in a track. Using
+   this box, it is possible to locate samples in time and to determine
+   their type, size, container, and offset into that container. Inside
+   the Sample Table Box, we can find the Sample Description Box (stsd,
+   for finding sample descriptions), the Decoding Time to Sample Box
+   (stts, for finding sample duration), the Sample Size Box (stsz), and
+   the Sample to Chunk Box (stsc, for finding the sample description
+   index).
+
+   Finally, the Media Data Box contains the media data itself.  In timed
+   text tracks, this box contains text samples.  Its equivalent to audio
+   and video is audio and video frames, respectively.  The text sample
+   consists of the text length, the text string, and one or several
+   Modifier Boxes.  The text length is the size of the text in bytes.
+
+
+
+Rey & Matsui                Standards Track                    [Page 64]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+   The text string is plain text to render.  The Modifier Box is
+   information to render in addition to the text, such as color, font,
+   etc.
+
+14.  Acknowledgements
+
+   The authors would like to thank Dave Singer, Jan van der Meer, Magnus
+   Westerlund, and Colin Perkins for their comments and suggestions
+   about this document.
+
+   The authors would also like to thank Markus Gebhard for the free and
+   publicly available JavE ASCII Editor (used for the ASCII drawings in
+   this document) and Henrik Levkowetz for the Idnits web service.
+
+Authors' Addresses
+
+   Jose Rey
+   Panasonic R&D Center Germany GmbH
+   Monzastr. 4c
+   D-63225 Langen, Germany
+
+   EMail: jose.rey@eu.panasonic.com
+   Phone: +49-6103-766-134
+   Fax:   +49-6103-766-166
+
+
+   Yoshinori Matsui
+   Matsushita Electric Industrial Co., LTD.
+   1006 Kadoma
+   Kadoma-shi, Osaka, Japan
+
+   EMail: matsui.yoshinori@jp.panasonic.com
+   Phone: +81 6 6900 9689
+   Fax:   +81 6 6900 9699
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 65]
+
+RFC 4396          Payload Format for 3GPP Timed Text       February 2006
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (2006).
+
+   This document is subject to the rights, licenses and restrictions
+   contained in BCP 78, and except as set forth therein, the authors
+   retain all their rights.
+
+   This document and the information contained herein are provided on an
+   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
+   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
+   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
+   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
+   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
+   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Intellectual Property
+
+   The IETF takes no position regarding the validity or scope of any
+   Intellectual Property Rights or other rights that might be claimed to
+   pertain to the implementation or use of the technology described in
+   this document or the extent to which any license under such rights
+   might or might not be available; nor does it represent that it has
+   made any independent effort to identify any such rights.  Information
+   on the procedures with respect to rights in RFC documents can be
+   found in BCP 78 and BCP 79.
+
+   Copies of IPR disclosures made to the IETF Secretariat and any
+   assurances of licenses to be made available, or the result of an
+   attempt made to obtain a general license or permission for the use of
+   such proprietary rights by implementers or users of this
+   specification can be obtained from the IETF on-line IPR repository at
+   http://www.ietf.org/ipr.  The IETF invites any interested party to
+   bring to its attention any copyrights, patents or patent
+   applications, or other proprietary rights that may cover technology
+   that may be required to implement this standard.  Please address the
+   information to the IETF at ietf-ipr@ietf.org.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is provided by the IETF
+   Administrative Support Activity (IASA).
+
+
+
+
+
+
+
+
+
+Rey & Matsui                Standards Track                    [Page 66]
+
author	Thomas Voss <mail@thomasvoss.com>	2024-11-27 20:54:24 +0100
committer	Thomas Voss <mail@thomasvoss.com>	2024-11-27 20:54:24 +0100
commit	4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
tree	e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc4396.txt
parent	ea76e11061bda059ae9f9ad130a9895cc85607db (diff)