summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc5584.txt
diff options
context:
space:
mode:
authorThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committerThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
treee3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc5584.txt
parentea76e11061bda059ae9f9ad130a9895cc85607db (diff)
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc5584.txt')
-rw-r--r--doc/rfc/rfc5584.txt1683
1 files changed, 1683 insertions, 0 deletions
diff --git a/doc/rfc/rfc5584.txt b/doc/rfc/rfc5584.txt
new file mode 100644
index 0000000..c0c0938
--- /dev/null
+++ b/doc/rfc/rfc5584.txt
@@ -0,0 +1,1683 @@
+
+
+
+
+
+
+Network Working Group M. Hatanaka
+Request for Comments: 5584 J. Matsumoto
+Category: Standards Track Sony Corporation
+ July 2009
+
+ RTP Payload Format for
+ the Adaptive TRansform Acoustic Coding (ATRAC) Family
+
+Abstract
+
+ This document describes an RTP payload format for efficient and
+ flexible transporting of audio data encoded with the Adaptive
+ TRansform Audio Coding (ATRAC) family of codecs. Recent enhancements
+ to the ATRAC family of codecs support high-quality audio coding with
+ multiple channels. The RTP payload format as presented in this
+ document also includes support for data fragmentation, elementary
+ redundancy measures, and a variation on scalable streaming.
+
+Status of This Memo
+
+ This document specifies an Internet standards track protocol for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "Internet
+ Official Protocol Standards" (STD 1) for the standardization state
+ and status of this protocol. Distribution of this memo is unlimited.
+
+Copyright Notice
+
+ Copyright (c) 2009 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents in effect on the date of
+ publication of this document (http://trustee.ietf.org/license-info).
+ Please review these documents carefully, as they describe your rights
+ and restrictions with respect to this document.
+
+ This document may contain material from IETF Documents or IETF
+ Contributions published or made publicly available before November
+ 10, 2008. The person(s) controlling the copyright in some of this
+ material may not have granted the IETF Trust the right to allow
+ modifications of such material outside the IETF Standards Process.
+ Without obtaining an adequate license from the person(s) controlling
+ the copyright in such materials, this document may not be modified
+ outside the IETF Standards Process, and derivative works of it may
+ not be created outside the IETF Standards Process, except to format
+ it for publication as an RFC or to translate it into languages other
+ than English.
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 1]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+Table of Contents
+
+ 1. Introduction ....................................................3
+ 2. Conventions Used in This Document ...............................3
+ 3. Codec-Specific Details ..........................................3
+ 4. RTP Packetization and Transport of ATRAC-Family Streams .........4
+ 4.1. ATRAC Frames ...............................................4
+ 4.2. Concatenation of Frames ....................................4
+ 4.3. Frame Fragmentation ........................................4
+ 4.4. Transmission of Redundant Frames ...........................4
+ 4.5. Scalable Lossless Streaming (High-Speed Transfer Mode) .....5
+ 4.5.1. Scalable Multiplexed Streaming ......................5
+ 4.5.2. Scalable Multi-Session Streaming ....................5
+ 5. Payload Format ..................................................6
+ 5.1. Global Structure of Payload Format .........................6
+ 5.2. Usage of RTP Header Fields .................................7
+ 5.3. RTP Payload Structure ......................................8
+ 5.3.1. Usage of ATRAC Header Section .......................8
+ 5.3.2. Usage of ATRAC Frames Section .......................9
+ 6. Packetization Examples .........................................12
+ 6.1. Example Multi-Frame Packet ................................12
+ 6.2. Example Fragmented ATRAC Frame ............................13
+ 7. Payload Format Parameters ......................................14
+ 7.1. ATRAC3 Media Type Registration ............................14
+ 7.2. ATRAC-X Media Type Registration ...........................16
+ 7.3. ATRAC Advanced Lossless Media Type Registration ...........18
+ 7.4. Channel Mapping Configuration Table .......................20
+ 7.5. Mapping Media Type Parameters into SDP ....................21
+ 7.5.1. For Media Subtype ATRAC3 ...........................21
+ 7.5.2. For Media Subtype ATRAC-X ..........................21
+ 7.5.3. For Media Subtype ATRAC Advanced Lossless ..........22
+ 7.6. Offer/Answer Model Considerations .........................22
+ 7.6.1. For All Three Media Subtypes .......................22
+ 7.6.2. For Media Subtype ATRAC3 ...........................23
+ 7.6.3. For Media Subtype ATRAC-X ..........................23
+ 7.6.4. For Media Subtype ATRAC Advanced Lossless ..........23
+ 7.7. Usage of Declarative SDP ..................................24
+ 7.8. Example SDP Session Descriptions ..........................24
+ 7.9. Example Offer/Answer Exchange .............................26
+ 8. IANA Considerations ............................................28
+ 9. Security Considerations ........................................28
+ 10. Considerations on Correct Decoding ............................28
+ 10.1. Verification of the Packets ..............................28
+ 10.2. Validity Checking of the Packets .........................29
+ 11. References ....................................................29
+ 11.1. Normative References .....................................29
+ 11.2. Informative References ...................................30
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 2]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+1. Introduction
+
+ The ATRAC family of perceptual audio codecs is designed to address
+ numerous needs for high-quality, low-bit-rate audio transfer. ATRAC
+ technology can be found in many consumer and professional products
+ and applications, including MD players, CD players, voice recorders,
+ and mobile phones.
+
+ Recent advances in ATRAC technology allow for multiple channels of
+ audio to be encoded in customizable groupings. This should allow for
+ future expansions in scaled streaming to provide the greatest
+ flexibility in streaming any one of the ATRAC family member codecs;
+ however, this payload format does not distinguish between the codecs
+ on a packet level.
+
+ This simplified payload format contains only the basic information
+ needed to disassemble a packet of ATRAC audio in order to decode it.
+ There is also basic support for fragmentation and redundancy.
+
+2. Conventions Used in This Document
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in RFC 2119 [4].
+
+3. Codec-Specific Details
+
+ Early versions of the ATRAC codec handled only two channels of audio
+ at 44.1 kHz sampling frequency, with typical bit-rates between 66
+ kbps and 132 kbps. The latest version allows for a maximum of 8
+ channels of audio, up to 96 kHz in sampling frequency, and a lossless
+ encoding option that can be transmitted in either a scalable (also
+ known as High-Speed Transfer mode) or standard (aka Standard mode)
+ format. The feasible bit-rate range has also expanded, allowing from
+ a low of 8 kbps up to 1400 kbps in lossy encoding modes.
+
+ Depending on the version of ATRAC used, the sample-frame size is
+ either 512, 1024, or 2048 samples. While the lossy and Standard mode
+ lossless formats are encoded as sequential single audio frames,
+ High-Speed Transfer mode lossless data comprises two layers -- a
+ lossy base layer and an enhancement layer.
+
+ Although streaming of multi-channel audio is supported depending on
+ the ATRAC version used, all encoded audio for a given time period is
+ contained within a single frame. Therefore, there is no interleaving
+ nor splitting of audio data on a per-channel basis with which to be
+ concerned.
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 3]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+4. RTP Packetization and Transport of ATRAC-Family Streams
+
+4.1. ATRAC Frames
+
+ For transportation of compressed audio data, ATRAC uses the concept
+ of frames. ATRAC frames are the smallest data unit for which timing
+ information is attributed. Frames are octet-aligned by definition.
+
+4.2. Concatenation of Frames
+
+ It is often possible to carry multiple frames in one RTP packet.
+ This can be useful in audio, where on a LAN with a 1500-byte MTU, an
+ average of 7 complete 64 kbps ATRAC frames could be carried in a
+ single RTP packet, as each ATRAC frame would be approximately 200
+ bytes. ATRAC frames may be of fixed or variable length. To
+ facilitate parsing in the case of multiple frames in one RTP packet,
+ the size of each frame is made known to the receiver by carrying
+ "in-band" the frame size for each contained frame in an RTP packet.
+ However, to simplify the implementation of RTP receivers, it is
+ required that when multiple frames are carried in an RTP packet, each
+ frame MUST be complete, i.e., the number of frames in an RTP packet
+ MUST be integral.
+
+4.3. Frame Fragmentation
+
+ The ATRAC codec can handle very large frames. As most IP networks
+ have significantly smaller MTU sizes than the frame sizes ATRAC can
+ handle, this payload format allows for the fragmentation of an ATRAC
+ frame over multiple RTP packets. However, to simplify the
+ implementation of RTP receivers, an RTP packet MUST carry either one
+ or more complete ATRAC frames or a single fragment of one ATRAC
+ frame. In other words, RTP packets MUST NOT contain fragments of
+ multiple ATRAC frames and MUST NOT contain a mix of complete and
+ fragmented frames.
+
+4.4. Transmission of Redundant Frames
+
+ As RTP does not guarantee reliable transmission, receipt of data is
+ not assured. Loss of a packet can result in a "decoding gap" at the
+ receiver. One method to remedy this problem is to allow time-shifted
+ copies of ATRAC frames to be sent along with current data. For a
+ modest cost in latency and implementation complexity, error
+ resiliency to packet loss can be achieved. For further details, see
+ Section 5.3.2.1 and [12].
+
+
+
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 4]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+4.5. Scalable Lossless Streaming (High-Speed Transfer Mode)
+
+ As ATRAC supports a variation on scalable encoding, this payload
+ format provides a mechanism for transmitting essential data (also
+ referred to as the base layer) with its enhancement data in two ways
+ -- multiplexed through one session or separated over two sessions.
+
+ In either method, only the base layer is essential in producing audio
+ data. The enhancement layer carries the remaining audio data needed
+ to decode lossless audio data. So in situations of limited
+ bandwidth, the sender may choose not to transmit enhancement data yet
+ still provide a client with enough data to generate lossily-encoded
+ audio through the base layer.
+
+4.5.1. Scalable Multiplexed Streaming
+
+ In multiplexed streaming, the base layer and enhancement layer are
+ coupled together in each packet, utilizing only one session as
+ illustrated in Figure 1.
+
+ The packet MUST begin with the base layer, and the two layer types
+ MUST interleave if both of the layers exist in a packet (only base or
+ enhancement is included in a packet at the beginning of a streaming,
+ or during the fragmentation).
+
+ +----------------+ +----------------+ +----------------+
+ |Base|Enhancement|--|Base|Enhancement|--|Base|Enhancement| ...
+ +----------------+ +----------------+ +----------------+
+ N N+1 N+2 : Packet
+
+ Figure 1. Multiplexed Structure
+
+4.5.2. Scalable Multi-Session Streaming
+
+ In multi-session streaming, the base layer and enhancement layer are
+ sent over two separate sessions, allowing clients with certain
+ bandwidth limitations to receive just the base layer for decoding as
+ illustrated in Figure 2.
+
+ In this case, it is REQUIRED to determine which sessions are paired
+ together in receiver side. For paired base and enhancement layer
+ sessions, the CNAME bindings in the RTP Control Protocol (RTCP)
+ session MUST be applied using the same CNAME to ensure correct
+ mapping to the RTP source.
+
+
+
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 5]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+ While there may be alternative methods for synchronization of the
+ layers, the timestamp SHOULD be used for synchronizing the base layer
+ with its enhancement. The two sessions MUST be synchronized using
+ the information in RTCP SR packets to align the RTP timestamps.
+
+ If the enhancement layer's session data cannot arrive until the
+ presentation time, the decoder MUST decode the base layer session's
+ data only, ignoring the enhancement layer's data.
+
+ Session 1:
+ +------+ +------+ +------+ +------+
+ | Base |--| Base |--| Base |--| Base | ...
+ +------+ +------+ +------+ +------+
+ N N+1 N+2 N+3 : Packet
+
+ Session 2:
+ +-------------+ +-------------+ +-------------+
+ | Enhancement |--| Enhancement |--| Enhancement | ...
+ +-------------+ +-------------+ +-------------+
+ N N+1 N+2 : Packet
+
+ Figure 2. Multi-Session Streaming
+
+5. Payload Format
+
+5.1. Global Structure of Payload Format
+
+ The structure of ATRAC Payload is illustrated in Figure 3. The RTP
+ payload following the RTP header contains two octet-aligned data
+ sections.
+
+ +------+--------------+-----------------------------+
+ |RTP | ATRAC Header | ATRAC Frames Section |
+ |Header| Section | (including redundant data) |
+ +------+--------------+-----------------------------+
+ < ---------------- RTP Packet Payload ------------- >
+
+ Figure 3. Structure of RTP Payload of ATRAC Family
+
+ The first data section is the ATRAC Header, containing just one
+ header with information for the whole packet. The second section is
+ where the encoded ATRAC frames are stored. This may contain either a
+ single fragment of one ATRAC frame or one or more complete ATRAC
+ frames. The ATRAC Frames Section MUST NOT be empty. When using the
+ redundancy mechanism described in Section 5.3.2.1, the redundant
+ frame data can be included in this section and timestamp MUST be set
+ to the oldest redundant frame's timestamp.
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 6]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+ To benefit from ATRAC's High-Speed Transfer mode lossless encoding
+ capability, the RTP payload can be split across two sessions, with
+ one transmitting an essential base layer and the other transmitting
+ enhancement data. However, in either case, the above structure still
+ applies.
+
+5.2. Usage of RTP Header Fields
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |V=2|P|X| CC |M| PT | sequence number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | timestamp |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | synchronization source (SSRC) identifier |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | contributing source (CSRC) identifiers |
+ | ..... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 4. RTP Standard Header Part
+
+ The structure of the RTP Standard Header Part is illustrated in
+ Figure 4.
+
+ Version(V): 2 bits
+ Set to 2.
+
+ Padding(P): 1 bit
+ If the padding bit is set, the packet contains one or more additional
+ padding octets at the end, which are not part of the payload. The
+ last octet of the padding contains a count of how many padding octets
+ should be ignored, including itself. Padding may be needed by some
+ encryption algorithms with fixed block sizes or for carrying several
+ RTP packets in a lower-layer protocol data unit (see [1]).
+
+ Extension(X): 1 bit
+ Defined by the RTP profile used.
+
+ CSRC count(CC): 4 bits
+ See RFC 3550 [1].
+
+ Marker (M): 1 bit
+ Set to 1 if the packet is the first packet after a silence period;
+ otherwise, it MUST be set to 0.
+
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 7]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+ Payload Type (PT): 7 bits
+ The assignment of an RTP payload type for this packet format is
+ outside the scope of this document; it is specified by the RTP
+ profile under which this payload format is used, or signaled
+ dynamically out-of-band (e.g., using the Session Description Protocol
+ (SDP)).
+
+ sequence number: 16 bits
+ A sequential number for the RTP packet. It ranges from 0 to 65535
+ and repeats itself periodically.
+
+ Timestamp: 32 bits
+ A timestamp representing the sampling time of the first sample of the
+ first ATRAC frame in the current RTP packet.
+ When using SDP, the clock rate of the RTP timestamp MUST be expressed
+ using the "rtpmap" attribute. For ATRAC3 and ATRAC Advanced
+ Lossless, the RTP timestamp rate MUST be 44100 Hz. For ATRAC-X, the
+ RTP timestamp rate is 44100 Hz or 48000 Hz, and it will be selected
+ by out-of-band signaling.
+
+ SSRC: 32 bits
+ See RFC 3550 [1].
+
+ CSRC list: 0 to 15 items, 32 bits each
+ See RFC 3550 [1].
+
+5.3. RTP Payload Structure
+
+5.3.1. Usage of ATRAC Header Section
+
+ The ATRAC header section has the fixed length of one byte as
+ illustrated in Figure 5.
+
+ 0 1 2 3 4 5 6 7
+ +-+-+-+-+-+-+-+-+
+ |C|FrgNo|NFrames|
+ +-+-+-+-+-+-+-+-+
+
+ Figure 5. ATRAC RTP Header
+
+ Continuation Flag (C) : 1 bit
+ The packet that corresponds to the last part of the audio frame data
+ in a fragmentation MUST have this bit set to 0; otherwise, it's set
+ to 1.
+
+ Fragment Number (FrgNo): 3 bits
+ In the event of data fragmentation, this value is one for the first
+ packet, and increases sequentially for the remaining fragmented data
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 8]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+ packets. This value MUST be zero for an unfragmented frame. (Note:
+ 3 bits is sufficient to avoid Fragment Number rollover given the
+ current maximum supported bit-rate in the ATRAC specification. If
+ that changes, the choice of 3 bits for the Fragment Number should be
+ revisited.)
+
+ Number of Frames (NFrames): 4 bits
+ The number of audio frames in this packet are field value + 1. This
+ allows for a maximum of 16 ATRAC-encoded audio frames per packet,
+ with 0 indicating one audio frame. Each audio frame MUST be complete
+ in the packet if fragmentation is not applied. In the case of
+ fragmentation, the data for only one audio frame is allowed to be
+ fragmented, and this value MUST be 0.
+
+5.3.2. Usage of ATRAC Frames Section
+
+ The ATRAC Frames Section contains an integer number of complete ATRAC
+ frames or a single fragment of one ATRAC frame, as illustrated in
+ Figure 6. Each ATRAC frame is preceded by a one-bit flag indicating
+ the layer type and a Block Length field indicating the size in bytes
+ of the ATRAC frame. If more than one ATRAC frame is present, then
+ the frames are concatenated into a contiguous string of bit-flag,
+ Block Length, and ATRAC frame in order of their frame number. This
+ section MUST NOT be empty.
+
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |E| Block Length | ATRAC frame |...
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 6. ATRAC Frame Section Format
+
+ Layer Type Flag (E): 1 bit
+ Set to 1 if the corresponding ATRAC frame is from an enhancement
+ layer. 0 indicates a base layer encoded frame.
+
+ Block length: 15 bits
+ The byte length of encoded audio data for the following frame. This
+ is so that in the case of fragmentation, if only a subsequent packet
+ is received, decoding can still occur. 15 bits allows for a maximum
+ block length of 32,767 bytes.
+
+ ATRAC frame: The encoded ATRAC audio data.
+
+
+
+
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 9]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+5.3.2.1. Support of Redundancy
+
+ This payload format provides a rudimentary scheme to compensate for
+ occasional packet loss. As every packet's timestamp corresponds to
+ the first audio frame regardless of whether or not it is redundant,
+ and because we know how many frames of audio each packet
+ encapsulates, if two successive packets are successfully transmitted,
+ we can calculate the number of redundant frames being sent. The
+ result gives the client a sense of how the server is responding to
+ RTCP reports and warns it to expand its buffer size if necessary. As
+ an example of using the Redundant Data, refer to Figures 7 and 8.
+
+ In this example, the server has determined that for the next few
+ packets, it should send the last two frames from the previous packet
+ due to recent RTCP reports. Thus, between packets N and N+1, there
+ is a redundancy of two frames (of which the client may choose to
+ dispose). The benefit arises when packets N+2 and N+3 do not arrive
+ at all, after which eventually packet N+4 arrives with successive
+ necessary audio frame data.
+
+ [Sender]
+
+ |-Fr0-|-Fr1-|-Fr2-| Packet: N, TS=0
+ |-Fr1-|-Fr2-|-Fr3-| Packet: N+1, TS=1024
+ |-Fr2-|-Fr3-|-Fr4-| Packet: N+2, TS=2048
+ |-Fr3-|-Fr4-|-Fr5-| Packet: N+3, TS=3072
+ |-Fr4-|-Fr5-|-Fr6-| Packet: N+4, TS=4096
+
+ -----------> Packet "N+2" and "N+3" not arrived ------------->
+
+ [Receiver]
+
+ |-Fr0-|-Fr1-|-Fr2-| Packet: N, TS=0
+ |-Fr1-|-Fr2-|-Fr3-| Packet: N+1, TS=1024
+ |-Fr4-|-Fr5-|-Fr6-| Packet: N+4, TS=4096
+
+ The receiver can decode from FR4 to Fr6 by using Packet "N+4" data
+ even if the packet loss of "N+2" and "N+3" has occurred.
+
+ Figure 7. Redundant Example
+
+
+
+
+
+
+
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 10]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |V=2|P|X| CC |M| PT | sequence number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | timestamp (= start sample time of Fr1) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | synchronization source (SSRC) identifier |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | contributing source (CSRC) identifiers |
+ | ..... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0| 0 | 3 |0| Block Length | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | (redundant) ATRAC frame (Fr1) data ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0| Block Length |(redundant) ATRAC frame (Fr2) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | (cont.) |0| Block Length | ATRAC frame (Fr3) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | (cont.) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 8. Packet Structure Example with Redundant Data
+ (Case of Packet "N+1")
+
+5.3.2.2. Frame Fragmentation
+
+ Each RTP packet MUST contain either an integer number of ATRAC-
+ encoded audio frames (with a maximum of 16) or one ATRAC frame
+ fragment. In the former case, as many complete ATRAC frames as can
+ fit in a single path-MTU SHOULD be placed in an RTP packet. However,
+ if even a single ATRAC frame will not fit into a complete RTP packet,
+ the ATRAC frame MUST be fragmented.
+
+ The start of a fragmented frame gets placed in its own RTP packet
+ with its Continuation bit (C) set to one, and its Fragment Number
+ (FragNo) set to one. As the frame must be the only one in the
+ packet, the Number of Frames field is zero. Subsequent packets are
+ to contain the remaining fragmented frame data, with the Fragment
+ Number increasing sequentially and the Continuation bit (C)
+ consistently set to one. As subsequent packets do not contain any
+ new frames, the Number of Frames field MUST be ignored. The last
+ packet of fragmented data MUST have the Continuation bit (C) set to
+ zero.
+
+
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 11]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+ Packets containing related fragmented frames MUST have identical
+ timestamps. Thus, while the Continuous bit and Fragment Number
+ fields indicate fragmentation and a means to reorder the packets, the
+ timestamp can be used to determine which packets go together.
+
+6. Packetization Examples
+
+6.1. Example Multi-Frame Packet
+
+ Multiple encoded audio frames are combined into one packet. Note
+ how, for this example, only base layer frames are sent redundantly,
+ but are followed by interleaved base layer and enhancement layer
+ frames as illustrated in Figure 9.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |V=2|P|X| CC |M| PT | sequence number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | timestamp |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | synchronization source (SSRC) identifier |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | contributing source (CSRC) identifiers |
+ | ..... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0| 0 | 5 |0| Block Length | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | (redundant) base layer frame 1 data... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0| Block Length |(redundant) base layer frame 2 |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | (cont.) |0| Block Length | base layer frame 3 |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | (cont.) |1| Block Length | enhancement frame 3 |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | (cont.) |0| Block Length | base layer frame 4 |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | (cont.) |1| Block Length | enhancement frame 4 |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 9. Example Multi-Frame Packet
+
+
+
+
+
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 12]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+6.2. Example Fragmented ATRAC Frame
+
+ The encoded audio data frame is split over three RTP packets as
+ illustrated in Figure 10. The following points are highlighted in
+ the example below:
+
+ o transition from one to zero of the Continuation bit (C)
+
+ o sequential increase in the Fragment Number
+
+ Packet 1:
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |V=2|P|X| CC |M| PT | sequence number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | timestamp |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | synchronization source (SSRC) identifier |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | contributing source (CSRC) identifiers |
+ | ..... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |1| 1 | 0 |1| Block Length | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | enhancement data... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Packet 2:
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |V=2|P|X| CC |M| PT | sequence number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | timestamp |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | synchronization source (SSRC) identifier |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | contributing source (CSRC) identifiers |
+ | ..... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |1| 2 | 0 |1| Block Length | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ...more enhancement data... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 13]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+ Packet 3:
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |V=2|P|X| CC |M| PT | sequence number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | timestamp |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | synchronization source (SSRC) identifier |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | contributing source (CSRC) identifiers |
+ | ..... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0| 3 | 0 |1| Block Length | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ...the last of the enhancement data |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 10. Example Fragmented ATRAC Frame
+
+7. Payload Format Parameters
+
+ Certain parameters will need to be defined before ATRAC-family-
+ encoded content can be streamed. Other optional parameters may also
+ be defined to take advantage of specific features relevant to certain
+ ATRAC versions. Parameters for ATRAC3, ATRAC-X, and ATRAC Advanced
+ Lossless are defined here as part of the media subtype registration
+ process. A mapping of these parameters into the Session Description
+ Protocol (SDP) (RFC 4566) [2] is also provided for applications that
+ utilize SDP. These registrations use the template defined in RFC
+ 4288 [5] and follow RFC 4855 [6].
+
+ The data format and parameters are specified for real-time transport
+ in RTP.
+
+7.1. ATRAC3 Media Type Registration
+
+ The media subtype for the Adaptive TRansform Codec version 3 (ATRAC3)
+ uses the template defined in RFC 4855 [6].
+
+ Note, any unknown parameter MUST be ignored by the receiver.
+
+ Type name: audio
+
+ Subtype name: ATRAC3
+
+
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 14]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+ Required parameters:
+
+ rate: Represents the sampling frequency in Hz of the original audio
+ data. Permissible value is 44100 only.
+
+ baseLayer: Indicates the encoded bit-rate in kbps for the audio data
+ to be streamed. Permissible values are 66, 105, and 132.
+
+ Optional parameters:
+
+ ptime: See RFC 4566 [2].
+
+ maxptime: See RFC 4566 [2].
+ The frame length of ATRAC3 is 1024/44100 = 23.22...(ms), and
+ fractional value may not be applicable for the SDP definition.
+
+ So the value of the parameter MUST be a multiple of 24 (ms)
+ considering safe transmission.
+
+ If this parameter is not present, the sender MAY encapsulate a
+ maximum of 6 encoded frames into one RTP packet, in streaming of
+ ATRAC3.
+
+ maxRedundantFrames: The maximum number of redundant frames that may
+ be sent during a session in any given packet under the redundant
+ framing mechanism detailed in the document. Allowed values are
+ integers in the range of 0 to 15, inclusive. If this parameter is
+ not used, a default of 15 MUST be assumed.
+
+ Encoding considerations: This media type is framed and contains
+ binary data.
+
+ Security considerations: This media type does not carry active
+ content. See Section 9 of this document.
+
+ Interoperability considerations: none
+
+ Published specification: ATRAC3 Standard Specification [9]
+
+ Applications that use this media type:
+ Audio and video streaming and conferencing tools.
+
+ Additional information: none
+ Magic number(s): none
+ File extension(s): 'at3', 'aa3', and 'omg'
+ Macintosh file type code(s): none
+
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 15]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+ Person and email address to contact for further information:
+ Mitsuyuki Hatanaka
+ Jun Matsumoto
+ actech@jp.sony.com
+
+ Intended usage: COMMON
+
+ Restrictions on usage: This media type depends on RTP framing, and
+ hence is only defined for transfer via RTP.
+
+ Author:
+ Mitsuyuki Hatanaka
+ Jun Matsumoto
+ actech@jp.sony.com
+
+ Change controller: IETF AVT WG delegated from the IESG
+
+7.2. ATRAC-X Media Type Registration
+
+ The media subtype for the Adaptive TRansform Codec version X
+ (ATRAC-X) uses the template defined in RFC 4855 [6].
+
+ Note, any unknown parameter MUST be ignored by the receiver.
+
+ Type name: audio
+
+ Subtype name: ATRAC-X
+
+ Required parameters:
+
+ rate: Represents the sampling frequency in Hz of the original
+ audio data. Permissible values are 44100 and 48000.
+
+ baseLayer: Indicates the encoded bit-rate in kbps for the audio data
+ to be streamed. Permissible values are 32, 48, 64, 96, 128, 160,
+ 192, 256, 320, and 352.
+
+ channelID: Indicates the number of channels and channel layout
+ according to the table1 in Section 7.4. Note that this layout is
+ different from that proposed in RFC 3551 [3]. However, as channelID
+ = 0 defines an ambiguous channel layout, the channel mapping defined
+ in Section 4.1 of [3] could be used. Permissible values are 0, 1, 2,
+ 3, 4, 5, 6, 7.
+
+ Optional parameters:
+
+ ptime: See RFC 4566 [2].
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 16]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+ maxptime: See RFC 4566 [2].
+ The frame length of ATRAC-X is 2048/44100 = 46.44...(ms) or
+ 2048/48000 = 42.67...(ms), but fractional value may not be applicable
+ for the SDP definition. So the value of the parameter MUST be a
+ multiple of 47 (ms) or 43 (ms) considering safe transmission.
+
+ If this parameter is not present, the sender MAY encapsulate a
+ maximum of 16 encoded frames into one RTP packet, in streaming of
+ ATRAC-X.
+
+ maxRedundantFrames: The maximum number of redundant frames that may
+ be sent during a session in any given packet under the redundant
+ framing mechanism detailed in the document. Allowed values are
+ integers in the range 0 to 15, inclusive. If this parameter is not
+ used, a default of 15 MUST be assumed.
+
+ delayMode: Indicates a desire to use low-delay features, in which
+ case the decoder will process received data accordingly based on this
+ value. Permissible values are 2 and 4.
+
+ Encoding considerations: This media type is framed and contains
+ binary data.
+
+ Security considerations: This media type does not carry active
+ content. See Section 9 of this document.
+
+ Interoperability considerations: none
+
+ Published specification: ATRAC-X Standard Specification [10]
+
+ Applications that use this media type:
+ Audio and video streaming and conferencing tools.
+
+ Additional information: none
+
+ Magic number(s): none
+ File extension(s): 'atx', 'aa3', and 'omg'
+ Macintosh file type code(s): none
+
+ Person and email address to contact for further information:
+ Mitsuyuki Hatanaka
+ Jun Matsumoto
+ actech@jp.sony.com
+
+ Intended usage: COMMON
+
+ Restrictions on usage: This media type depends on RTP framing, and
+ hence is only defined for transfer via RTP.
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 17]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+ Author:
+ Mitsuyuki Hatanaka
+ Jun Matsumoto
+ actech@jp.sony.com
+
+ Change controller: IETF AVT WG delegated from the IESG
+
+7.3. ATRAC Advanced Lossless Media Type Registration
+
+ The media subtype for the Adaptive TRansform Codec Lossless version
+ (ATRAC Advanced Lossless) uses the template defined in RFC 4855 [6].
+
+ Note, any unknown parameter MUST be ignored by the receiver.
+
+ Type name: audio
+
+ Subtype name: ATRAC-ADVANCED-LOSSLESS
+
+ Required parameters:
+
+ rate: Represents the sampling frequency in Hz of the original
+ audio data. Permissible value is 44100 only for High-Speed Transfer
+ mode. Any value of 24000, 32000, 44100, 48000, 64000, 88200, 96000,
+ 176400, and 192000 can be used for Standard mode.
+
+ baseLayer: Indicates the encoded bit-rate in kbps for the base layer
+ in High-Speed Transfer mode lossless encodings.
+
+ For Standard lossless mode, this value MUST be 0.
+
+ The Permissible values for ATRAC3 baselayer are 66, 105, and 132.
+ For ATRAC-X baselayer, they are 32, 48, 64, 96, 128, 160, 192, 256,
+ 320, and 352.
+
+ blockLength: Indicates the block length. In High-Speed Transfer
+ mode, the value of 1024 and 2048 is used for ATRAC3 based and ATRAC-X
+ based ATRAC Advanced Lossless streaming, respectively.
+
+ Any value of 512, 1024, and 2048 can be used for Standard mode.
+
+ channelID: Indicates the number of channels and channel layout
+ according to the table1 in Section 7.4. Note that this layout is
+ different from that proposed in RFC 3551 [3]. However, as channelID
+ = 0 defines an ambiguous channel layout, the channel mapping defined
+ in Section 4.1 of [3] could be used in this case. Permissible values
+ are 0, 1, 2, 3, 4, 5, 6, 7.
+
+ ptime: See RFC 4566 [2].
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 18]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+ maxptime: See RFC 4566 [2].
+ In streaming of ATRAC Advanced Lossless, multiple frames cannot be
+ transmitted in a single RTP packet, as the frame size is large. So
+ it SHOULD be regarded as the time of one encoded frame in both of the
+ sender and the receiver side. The frame length of ATRAC Advanced
+ Lossless is 512/44100 = 11.6...(ms), 1024/44100 = 23.22...(ms), or
+ 2048/44100 = 46.44...(ms), but fractional value may not be applicable
+ for the SDP definition. So the value of the parameter MUST be
+ 12(ms), 24(ms), or 47(ms) considering safe transmission.
+
+ Encoding considerations: This media type is framed and contains
+ binary data.
+
+ Security considerations: This media type does not carry active
+ content. See Section 9 of this document.
+
+ Interoperability considerations: none
+
+ Published specification:
+ ATRAC Advanced Lossless Standard Specification [11]
+
+ Applications that use this media type:
+ Audio and video streaming and conferencing tools.
+
+ Additional information: none
+
+ Magic number(s): none
+ File extension(s): 'aal', 'aa3', and 'omg'
+ Macintosh file type code(s): none
+
+ Person and email address to contact for further information:
+
+ Mitsuyuki Hatanaka
+ Jun Matsumoto
+ actech@jp.sony.com
+
+ Intended usage: COMMON
+
+ Restrictions on usage: This media type depends on RTP framing, and
+ hence is only defined for transfer via RTP.
+
+ Author:
+ Mitsuyuki Hatanaka
+ Jun Matsumoto
+ actech@jp.sony.com
+
+ Change controller: IETF AVT WG delegated from the IESG
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 19]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+7.4. Channel Mapping Configuration Table
+
+ Table 1 explains the mapping between the channelID as passed during
+ SDP negotiations, and the speaker mapping the value represents.
+
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | channelID | Number of | Default Speaker |
+ | | Channels | Mapping |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | 0 | max 64 | undefined |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | 1 | 1 | front: center |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | 2 | 2 | front: left, right |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | 3 | 3 | front: left, right |
+ | | | front: center |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | 4 | 4 | front: left, right |
+ | | | front: center |
+ | | | rear: surround |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | 5 | 5+1 | front: left, right |
+ | | | front: center |
+ | | | rear: left, right |
+ | | | LFE |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | 6 | 6+1 | front: left, right |
+ | | | front: center |
+ | | | rear: left, right |
+ | | | rear: center |
+ | | | LFE |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | 7 | 7+1 | front: left, right |
+ | | | front: center |
+ | | | rear: left, right |
+ | | | side: left, right |
+ | | | LFE |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Table 1. Channel Configuration
+
+
+
+
+
+
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 20]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+7.5. Mapping Media Type Parameters into SDP
+
+ The information carried in the Media type specification has a
+ specific mapping to fields in the Session Description Protocol (SDP)
+ [2], which is commonly used to describe RTP sessions. When SDP is
+ used to specify sessions employing the ATRAC family of codecs, the
+ following mapping rules according to the ATRAC codec apply.
+
+7.5.1. For Media Subtype ATRAC3
+
+ o The Media type ("audio") goes in SDP "m=" as the media name.
+
+ o The Media subtype (payload format name) goes in SDP "a=rtpmap" as
+ the encoding name. ATRAC3 supports only mono or stereo signals,
+ so a corresponding number of channels (0 or 1) MUST also be
+ specified in this attribute.
+
+ o The "baseLayer" parameter goes in SDP "a=fmtp". This parameter
+ MUST be present. "maxRedundantFrames" may follow, but if no value
+ is transmitted, the receiver SHOULD assume a default value of
+ "15".
+
+ o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
+ "a=maxptime" attributes, respectively.
+
+7.5.2. For Media Subtype ATRAC-X
+
+ o The Media type ("audio") goes in SDP "m=" as the media name.
+
+ o The Media subtype (payload format name) goes in SDP "a=rtpmap" as
+ the encoding name. This SHOULD be followed by the "sampleRate"
+ (as the RTP clock rate), and then the actual number of channels
+ regardless of the channelID parameter.
+
+ o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
+ "a=maxptime" attributes, respectively.
+
+ o Any remaining parameters go in the SDP "a=fmtp" attribute by
+ copying them directly from the Media type string as a semicolon-
+ separated list of parameter=value pairs. The "baseLayer"
+ parameter MUST be the first entry on this line. The "channelID"
+ parameter MUST be the next entry. The receiver MUST assume a
+ default value of "15" for "maxRedundantFrames".
+
+
+
+
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 21]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+7.5.3. For Media Subtype ATRAC Advanced Lossless
+
+ o The Media type ("audio") goes in SDP "m=" as the media name.
+
+ o The Media subtype (payload format name) goes in SDP "a=rtpmap" as
+ the encoding name. This MUST be followed by the "sampleRate" (as
+ the RTP clock rate), and then the actual number of channels
+ regardless of the channelID parameter.
+
+ o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
+ "a=maxptime" attributes, respectively.
+
+ o Any remaining parameters go in the SDP "a=fmtp" attribute by
+ copying them directly from the Media type string as a semicolon-
+ separated list of parameter=value pairs.
+
+ On this line, the parameters "baseLayer" and "blockLength" MUST be
+ present in this order.
+
+ The value of "blockLength" MUST be one of 1024 and 2048, for using
+ ATRAC3 and ATRAC-X as baselayer, respectively. If "baseLayer=0"
+ (means standard mode), "blockLength" MUST be one of either 512,
+ 1024, or 2048. The "channelID" parameter MUST be the next entry .
+ The receiver MUST assume a default value of "15" for
+ "maxRedundantFrames".
+
+7.6. Offer/Answer Model Considerations
+
+ Some options for encoding and decoding ATRAC audio data will require
+ either or both of the sender and receiver complying with certain
+ specifications. In order to establish an interoperable transmission
+ framework, an Offer/Answer negotiation in SDP MUST observe the
+ following considerations. (See [14].)
+
+7.6.1. For All Three Media Subtypes
+
+ o Each combination of the RTP payload transport format configuration
+ parameters (baseLayer and blockLength, sampleRate, channelID) is
+ unique in its bit-pattern and not compatible with any other
+ combination. When creating an offer in an application desiring to
+ use the more advanced features (sample rates above 44100 kHz, more
+ than two channels), the offerer SHOULD also offer a payload type
+ containing only the lowest set of necessary requirements. If
+ multiple configurations are of interest to the application, they
+ may all be offered.
+
+
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 22]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+ o The parameters "maxptime" and "ptime" will in most cases not
+ affect interoperability; however, the setting of the parameters
+ can affect the performance of the application. The SDP
+ Offer/Answer handling of the "ptime" parameter is described in RFC
+ 3264. The "maxptime" parameter MUST be handled in the same way.
+
+7.6.2. For Media Subtype ATRAC3
+
+ o In response to an offer, downgraded subsets of "baseLayer" are
+ possible. However, for best performance, we suggest the answer
+ contain the highest possible values offered.
+
+7.6.3. For Media Subtype ATRAC-X
+
+ o In response to an offer, downgraded subsets of "sampleRate",
+ "baseLayer", and "channelID" are possible. For best performance,
+ an answer MUST NOT contain any values requiring further
+ capabilities than the offer contains, but it SHOULD provide values
+ as close as possible to those in the offer.
+
+ o The "maxRedundantFrames" is a suggested minimum. This value MAY
+ be increased in an answer (with a maximum of 15), but MUST NOT be
+ reduced.
+
+ o The optional parameter "delayMode" is non-negotiable. If the
+ Answerer cannot comply with the offered value, the session MUST be
+ deemed inoperable.
+
+7.6.4. For Media Subtype ATRAC Advanced Lossless
+
+ o In response to an offer, downgraded subsets of "sampleRate",
+ "baseLayer", and "channelID" are possible. For best performance,
+ an answer MUST NOT contain any values requiring further
+ capabilities than the offer contains, but it SHOULD provide values
+ as close as possible to those in the offer.
+
+ o There are no requirements when negotiating "blockLength", other
+ than that both parties must be in agreement.
+
+ o The "maxRedundantFrames" is a suggested minimum. This value MAY
+ be increased in an answer (with a maximum of 15), but MUST NOT be
+ reduced.
+
+ o For transmission of scalable multi-session streaming of ATRAC
+ Advanced Lossless content, the attributes of media stream
+ identification, group information, and decoding dependency between
+ base layer stream and enhancement layer stream MUST be signaled in
+ SDP by the Offer/Answer model. In this case, the attribute of
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 23]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+ "group", "mid", and "depend" followed by the appropriate parameter
+ MUST be used in SDP [7] [8] in order to indicate layered coding
+ dependency. The attribute of "group" followed by "DDP" parameter
+ is used for indicating the relationship between the base and the
+ enhancement layer stream with decoding dependency. Each stream is
+ identified by "mid" attribute, and the dependency of enhancement
+ layer stream is defined by the "depend" attribute, as the
+ enhancement layer is only useful when the base layer is available.
+ Examples for signaling ATRAC Advanced Lossless decoding dependency
+ are described in Sections 7.8 and 7.9.
+
+7.7. Usage of Declarative SDP
+
+ In declarative usage, like SDP in Real-Time Streaming Protocol (RTSP)
+ [15] or Session Announcement Protocol (SAP) [16], the parameters MUST
+ be interpreted as follows:
+
+ o The payload format configuration parameters (baseLayer,
+ sampleRate, channelID) are all declarative and a participant MUST
+ use the configuration(s) provided for the session. More than one
+ configuration may be provided if necessary by declaring multiple
+ RTP payload types; however, the number of types SHOULD be kept
+ small.
+
+ o Any "maxptime" and "ptime" values SHOULD be selected with care to
+ ensure that the session's participants can achieve reasonable
+ performance.
+
+ o The attribute of "mid", "group", and "depend" MUST be used for
+ indicating the relationship and dependency of the base layer and
+ the enhancement layer in scalable multi-session streaming of ATRAC
+ ADVANCED LOSSLESS content, as described in Sections 7.6, 7.8, and
+ 7.9.
+
+7.8. Example SDP Session Descriptions
+
+ Example usage of ATRAC-X with stereo at 44100 Hz:
+
+ v=0
+ o=atrac 2465317890 2465317890 IN IP4 service.example.com
+ s=ATRAC-X Streaming
+ c=IN IP4 192.0.2.1/127
+ t=3409539540 3409543140
+ m=audio 49120 RTP/AVP 99
+ a=rtpmap:99 ATRAC-X/44100/2
+ a=fmtp:99 baseLayer=128; channelID=2; delayMode=2
+ a=maxptime:47
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 24]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+ Example usage of ATRAC-X with 5.1 setup at 48000 Hz:
+
+ v=0
+ o=atrac 2465317890 2465317890 IN IP4 service.example.com
+ s=ATRAC-X 5.1ch Streaming
+ c=IN IP4 192.0.2.1/127
+ t=3409539540 3409543140
+ m=audio 49120 RTP/AVP 99
+ a=rtpmap:99 ATRAC-X/48000/6
+ a=fmtp:99 baseLayer=320; channelID=5
+ a=maxptime:43
+
+ Example usage of ATRAC-Advanced-Lossless in multiplexed
+ High-Speed Transfer mode:
+
+ v=0
+ o=atrac 2465317890 2465317890 IN IP4 service.example.com
+ s=AAL Multiplexed Streaming
+ c=IN IP4 192.0.2.1/127
+ t=3409539540 3409543140
+ m=audio 49200 RTP/AVP 96
+ a=rtpmap:96 ATRAC-ADVANCED-LOSSLESS/44100/2
+ a=fmtp:96 baseLayer=128; blockLength=2048; channelID=2
+ a=maxptime:47
+
+ Example usage of ATRAC-Advanced-Lossless in multi-session High-Speed
+ Transfer mode. In this case, the base layer and the enhancement
+ layer stream are identified by L1 and L2, respectively, and L2
+ depends on L1 in decoding.
+
+ v=0
+ o=atrac 2465317890 2465317890 IN IP4 service.example.com
+ s=AAL Multi Session Streaming
+ c=IN IP4 192.0.2.1/127
+ t=3409539540 3409543140
+ a=group:DDP L1 L2
+ m=audio 49200 RTP/AVP 96
+ a=rtpmap:96 ATRAC-ADVANCED-LOSSLESS/44100/2
+ a=fmtp:96 baseLayer=128; blockLength=2048; channelID=2
+ a=maxptime:47
+ a=mid:L1
+ m=audio 49202 RTP/AVP 97
+ a=rtpmap:97 ATRAC-ADVANCED-LOSSLESS/44100/2
+ a=fmtp:97 baseLayer=0; blockLength=2048; channelID=2
+ a=maxptime:47
+ a=mid:L2
+ a=depend:97 lay L1:96
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 25]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+ Example usage of ATRAC-Advanced-Lossless in Standard mode:
+
+ m=audio 49200 RTP/AVP 99
+ a=rtpmap:99 ATRAC-ADVANCED-LOSSLESS/44100/2
+ a=fmtp:99 baseLayer=0; blockLength=1024; channelID=2
+ a=maxptime:24
+
+7.9. Example Offer/Answer Exchange
+
+ The following Offer/Answer example shows how a desire to stream
+ multi-channel content is turned down by the receiver, who answers
+ with only the ability to receive stereo content:
+
+ Offer:
+
+ m=audio 49170 RTP/AVP 98 99
+ a=rtpmap:98 ATRAC-X/44100/6
+ a=fmtp:98 baseLayer=320; channelID=5
+ a=rtpmap:99 ATRAC-X/44100/2
+ a=fmtp:99 baseLayer=160; channelID=2
+
+ Answer:
+
+ m=audio 49170 RTP/AVP 99
+ a=rtpmap:99 ATRAC-X/44100/2
+ a=fmtp:99 baseLayer=160; channelID=2
+
+ The following Offer/Answer example shows the receiver answering with
+ a selection of supported parameters:
+
+ Offer:
+
+ m=audio 49170 RTP/AVP 97 98 99
+ a=rtpmap:97 ATRAC-X/44100/2
+ a=fmtp:97 baseLayer=128; channelID=2
+ a=rtpmap:98 ATRAC-X/44100/6
+ a=fmtp:98 baseLayer=128; channelID=5
+ a=rtpmap:99 ATRAC-X/48000/6
+ a=fmtp:99 baseLayer=320; channelID=5
+
+ Answer:
+
+ m=audio 49170 RTP/AVP 97 98
+ a=rtpmap:97 ATRAC-X/44100/2
+ a=fmtp:97 baseLayer=128; channelID=2
+ a=rtpmap:98 ATRAC-X/44100/6
+ a=fmtp:98 baseLayer=128; channelID=5
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 26]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+ The following Offer/Answer example shows an exchange in trying to
+ resolve using ATRAC-Advanced-Lossless. The offer contains three
+ options: multi-session High-Speed Transfer mode, multiplexed High-
+ Speed Transfer mode, and Standard mode.
+
+ Offer:
+
+// Multi-session High-Speed Transfer mode, L1 and L2 correspond
+ to the base layer and the enhancement layer, respectively, and L2
+ depends on L1 in decoding.
+
+ a=group:DDP L1 L2
+ m=audio 49200 RTP/AVP 96
+ a=rtpmap:96 ATRAC-ADVANCED-LOSSLESS/44100/2
+ a=fmtp:96 baseLayer=132; blockLength=1024; channelID=2
+ a=maxptime:24
+ a=mid:L1
+
+ m=audio 49202 RTP/AVP 97
+ a=rtpmap:97 ATRAC-ADVANCED-LOSSLESS/44100/2
+ a=fmtp:97 baseLayer=0; blockLength=2048; channelID=2
+ a=maxptime:24
+ a=mid:L2
+ a=depend:97 lay L1:96
+
+// Multiplexed High-Speed Transfer mode
+ m=audio 49200 RTP/AVP 98
+ a=rtpmap:98 ATRAC-ADVANCED-LOSSLESS/44100/2
+ a=fmtp:98 baseLayer=256; blockLength=2048; channelID=2
+ a=maxptime:47
+
+// Standard mode
+ m=audio 49200 RTP/AVP 99
+ a=rtpmap:99 ATRAC-ADVANCED-LOSSLESS/44100/2
+ a=fmtp:99 baseLayer=0; blockLength=2048; channelID=2
+ a=maxptime:47
+
+ Answer:
+
+ a=group:DDP L1 L2
+ m=audio 49200 RTP/AVP 94
+ a=rtpmap:94 ATRAC-ADVANCED-LOSSLESS/44100/2
+ a=fmtp:94 baseLayer=132; blockLength=1024; channelID=2
+ a=maxptime:24
+ a=mid:L1
+
+
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 27]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+ m=audio 49202 RTP/AVP 95
+ a=rtpmap:95 ATRAC-ADVANCED-LOSSLESS/44100/2
+ a=fmtp:95 baseLayer=0; blockLength=2048; channelID=2
+ a=maxptime:24
+ a=mid:L2
+ a=depend:95 lay L1:94
+
+ Note that the names of payload format (encoding) and Media subtypes
+ are case-insensitive in both places. Similarly, parameter names are
+ case-insensitive both in Media types and in the default mapping to
+ the SDP a=fmtp attribute.
+
+8. IANA Considerations
+
+ Three new Media subtypes, audio/ATRAC3, audio/ATRAC-X, and
+ audio/ATRAC-ADVANCED-LOSSLESS, have been registered (see Section 7).
+
+9. Security Considerations
+
+ The payload format as described in this document is subject to the
+ security considerations defined in RFC 3550 [1] and any applicable
+ profile, for example, RFC 3551 [3]. Also, the security of Media type
+ registration MUST be taken into account as described in Section 5 of
+ RFC 4855 [6].
+
+ The payload for ATRAC family consists solely of compressed audio data
+ to be decoded and presented as sound, and the standard specifications
+ of ATRAC3, ATRAC-X, and ATRAC Advanced Lossless [9] [10] [11]
+ strictly define the bit stream syntax and the buffer model in decoder
+ side for each codec. So they can not carry "active content" that
+ could impose malicious side effects upon the receiver, and they do
+ not cause any problem of illegal resource consumption in receiver
+ side, as far as the bit streams are conforming to their standard
+ specifications.
+
+ This payload format does not implement any security mechanisms of its
+ own. Confidentiality, integrity protection, and authentication have
+ to be provided by a mechanism external to this payload format, e.g.,
+ SRTP RFC 3711 [13].
+
+10. Considerations on Correct Decoding
+
+10.1. Verification of the Packets
+
+ Verification of the received encoded audio packets MUST be performed
+ so as to ensure correct decoding of the packets. As a most primitive
+ implementation, the comparison of the packet size and payload length
+ can be taken into account. If the UDP packet length is longer than
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 28]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+ the RTP packet length, the packet can be accepted, but the extra
+ bytes MUST be ignored. In case of receiving a shorter UDP packet or
+ improperly encoded packets, the packets MUST be discarded.
+
+10.2. Validity Checking of the Packets
+
+ Also, validity checking of the received audio packets MUST be
+ performed. It can be carried out by the decoding process, as the
+ ATRAC format is designed so that the validity of data frames can be
+ determined by decoding the algorithm. The required decoder response
+ to a malformed frame is to discard the malformed data and conceal the
+ errors in the audio output until a valid frame is detected and
+ decoded. This is expected to prevent crashes and other abnormal
+ decoder behavior in response to errors or attacks.
+
+11. References
+
+11.1. Normative References
+
+ [1] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson,
+ "RTP: A Transport Protocol for Real-Time Applications", STD 64,
+ RFC 3550, July 2003.
+
+ [2] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
+ Description Protocol", RFC 4566, July 2006.
+
+ [3] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video
+ Conferences with Minimal Control", STD 65, RFC 3551, July 2003.
+
+ [4] Bradner, S., "Key words for use in RFCs to Indicate Requirement
+ Levels", BCP 14, RFC 2119, March 1997.
+
+ [5] Freed, N. and J. Klensin, "Media Type Specifications and
+ Registration Procedures", BCP 13, RFC 4288, December 2005.
+
+ [6] Casner, S., "Media Type Registration of RTP Payload Formats",
+ RFC 4855, February 2007.
+
+ [7] Camarillo, G., Eriksson, G., Holler, J., and H. Schulzrinne,
+ "Grouping of Media Lines in the Session Description Protocol
+ (SDP)", RFC 3388, December 2002.
+
+ [8] Schierl, T., and S. Wenger, "Signaling Media Decoding
+ Dependency in the Session Description Protocol (SDP)", RFC
+ 5583, July 2009.
+
+ [9] ATRAC3 Standard Specification ver.1.1, Sony Corporation, 2003.
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 29]
+
+RFC 5584 RTP Payload Format for ATRAC Family July 2009
+
+
+ [10] ATRAC-X Standard Specification ver.1.2, Sony Corporation, 2004.
+
+ [11] ATRAC Advanced Lossless Standard Specification ver.1.1, Sony
+ Corporation, 2007.
+
+11.2. Informative References
+
+ [12] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley,
+ M., Bolot, J., Vega-Garcia, A., and S. Fosse-Parisis, "RTP
+ Payload for Redundant Audio Data", RFC 2198, September 1997.
+
+ [13] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
+ Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC
+ 3711, March 2004.
+
+ [14] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
+ Session Description Protocol (SDP)", RFC 3264, June 2002.
+
+ [15] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming
+ Protocol (RTSP)", RFC 2326, April 1998.
+
+ [16] Handley, M., Perkins, C., and E. Whelan, "Session Announcement
+ Protocol", RFC 2974, October 2000.
+
+Authors' Addresses
+
+ Mitsuyuki Hatanaka
+ Sony Corporation, Japan
+ 1-7-1 Konan
+ Minato-ku
+ Tokyo 108-0075
+ Japan
+
+ EMail: actech@jp.sony.com
+
+
+ Jun Matsumoto
+ Sony Corporation, Japan
+ 1-7-1 Konan
+ Minato-ku
+ Tokyo 108-0075
+ Japan
+
+ EMail: actech@jp.sony.com
+
+
+
+
+
+
+
+Hatanaka & Matsumoto Standards Track [Page 30]
+