diff options
Diffstat (limited to 'doc/rfc/rfc8593.txt')
-rw-r--r-- | doc/rfc/rfc8593.txt | 1067 |
1 files changed, 1067 insertions, 0 deletions
diff --git a/doc/rfc/rfc8593.txt b/doc/rfc/rfc8593.txt new file mode 100644 index 0000000..56c4adb --- /dev/null +++ b/doc/rfc/rfc8593.txt @@ -0,0 +1,1067 @@ + + + + + + +Internet Engineering Task Force (IETF) X. Zhu +Request for Comments: 8593 S. Mena +Category: Informational Cisco Systems +ISSN: 2070-1721 Z. Sarker + Ericsson AB + May 2019 + + + Video Traffic Models for RTP Congestion Control Evaluations + +Abstract + + This document describes two reference video traffic models for + evaluating RTP congestion control algorithms. The first model + statistically characterizes the behavior of a live video encoder in + response to changing requests on the target video rate. The second + model is trace-driven and emulates the output of actual encoded video + frame sizes from a high-resolution test sequence. Both models are + designed to strike a balance between simplicity, repeatability, and + authenticity in modeling the interactions between a live video + traffic source and the congestion control module. Finally, the + document describes how both approaches can be combined into a hybrid + model. + +Status of This Memo + + This document is not an Internet Standards Track specification; it is + published for informational purposes. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Not all documents + approved by the IESG are candidates for any level of Internet + Standard; see Section 2 of RFC 7841. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + https://www.rfc-editor.org/info/rfc8593. + + + + + + + + + + + + +Zhu, et al. Informational [Page 1] + +RFC 8593 Video Traffic Models for RTP May 2019 + + +Copyright Notice + + Copyright (c) 2019 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Simplified BSD License text as described in Section 4.e of + the Trust Legal Provisions and are provided without warranty as + described in the Simplified BSD License. + +Table of Contents + + 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 + 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 + 3. Desired Behavior of a Synthetic Video Traffic Model . . . . . 4 + 4. Interactions between Synthetic Video Traffic Source and + Other Components at the Sender . . . . . . . . . . . . . . . 5 + 5. A Statistical Reference Model . . . . . . . . . . . . . . . . 7 + 5.1. Time-Damped Response to Target-Rate Update . . . . . . . 9 + 5.2. Temporary Burst and Oscillation during the Transient + Period . . . . . . . . . . . . . . . . . . . . . . . . . 9 + 5.3. Output-Rate Fluctuation at Steady State . . . . . . . . . 9 + 5.4. Rate Range Limit Imposed by Video Content . . . . . . . . 10 + 6. A Trace-Driven Model . . . . . . . . . . . . . . . . . . . . 10 + 6.1. Choosing the Video Sequence and Generating the Traces . . 11 + 6.2. Using the Traces in the Synthetic Codec . . . . . . . . . 13 + 6.2.1. Main Algorithm . . . . . . . . . . . . . . . . . . . 13 + 6.2.2. Notes to the Main Algorithm . . . . . . . . . . . . . 14 + 6.3. Varying Frame Rate and Resolution . . . . . . . . . . . . 15 + 7. Combining the Two Models . . . . . . . . . . . . . . . . . . 16 + 8. Reference Implementation . . . . . . . . . . . . . . . . . . 17 + 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 + 10. Security Considerations . . . . . . . . . . . . . . . . . . . 17 + 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 + 11.1. Normative References . . . . . . . . . . . . . . . . . . 17 + 11.2. Informative References . . . . . . . . . . . . . . . . . 18 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19 + + + + + + + + + +Zhu, et al. Informational [Page 2] + +RFC 8593 Video Traffic Models for RTP May 2019 + + +1. Introduction + + When evaluating candidate congestion control algorithms designed for + real-time interactive media, it is important to account for the + characteristics of traffic patterns generated from a live video + encoder. Unlike synthetic traffic sources that can conform perfectly + to the rate-changing requests from the congestion control module, a + live video encoder can be sluggish in reacting to such changes. The + output rate of a live video encoder also typically deviates from the + target rate due to uncertainties in the encoder rate-control process. + Consequently, end-to-end delay and loss performance of a real-time + media flow can be further impacted by rate variations introduced by + the live encoder. + + On the other hand, evaluation results of a candidate RTP congestion + control algorithm should mostly reflect the performance of the + congestion control module and somewhat decouple from peculiarities of + any specific video codec. It is also desirable that evaluation tests + are repeatable and easily duplicated across different candidate + algorithms. + + One way to strike a balance between the above considerations is to + evaluate congestion control algorithms using a synthetic video + traffic source model that captures key characteristics of the + behavior of a live video encoder. The synthetic traffic model should + also contain tunable parameters so that it can be flexibly adjusted + to reflect the wide variations in real-world live video encoder + behaviors. To this end, this document presents two reference models. + The first is based on statistical modeling. The second is driven by + frame size and interval traces recorded from a real-world encoder. + This document also discusses the pros and cons of each approach, as + well as how both approaches can be combined into a hybrid model. + +2. Terminology + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and + "OPTIONAL" in this document are to be interpreted as described in + BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all + capitals, as shown here. + + + + + + + + + + + +Zhu, et al. Informational [Page 3] + +RFC 8593 Video Traffic Models for RTP May 2019 + + +3. Desired Behavior of a Synthetic Video Traffic Model + + A live video encoder employs encoder rate control to meet a target + rate by varying its encoding parameters, such as quantization step + size, frame rate, and picture resolution, based on its estimate of + the video content (e.g., motion and scene complexity). In practice, + however, several factors prevent the output video rate from perfectly + conforming to the input target rate. + + Due to uncertainties in the captured video scene, the output rate + typically deviates from the specified target. In the presence of a + significant change in target rate, the encoder's output frame sizes + sometimes fluctuate for a short, transient period of time before the + output rate converges to the new target. Finally, while most of the + frames in a live session are encoded in predictive mode (i.e., + P-frames in [H264]), the encoder can occasionally generate a large + intra-coded frame (i.e., I-frame as defined in [H264]) or a frame + partially containing intra-coded blocks in an attempt to recover from + losses, to re-sync with the receiver, or during the transient period + of responding to target rate or spatial resolution changes. + + Hence, a synthetic video source should have the following + capabilities: + + o To change bitrate. This includes the ability to change frame rate + and/or spatial resolution or to skip frames upon request. + + o To fluctuate around the target bitrate specified by the congestion + control module. + + o To show a delay in convergence to the target bitrate. + + o To generate intra-coded or repair frames on demand. + + While there exist many different approaches in developing a synthetic + video traffic model, it is desirable that the outcome follows a few + common characteristics, as outlined below. + + o Low computational complexity: The model should be computationally + lightweight, otherwise, it defeats the whole purpose of serving as + a substitute for a live video encoder. + + o Temporal pattern similarity: The individual traffic trace + instances generated by the model should mimic the temporal pattern + of those from a real video encoder. + + + + + + +Zhu, et al. Informational [Page 4] + +RFC 8593 Video Traffic Models for RTP May 2019 + + + o Statistical resemblance: The synthetic traffic source should match + the outcome of the real video encoder in terms of statistical + characteristics, such as the mean, variance, peak, and + autocorrelation coefficients of the bitrate. It is also important + that the statistical resemblance should hold across different time + scales ranging from tens of milliseconds to sub-seconds. + + o A wide range of coverage: The model should be easily configurable + to cover a wide range of codec behaviors (e.g., with either fast + or slow reaction time in live encoder rate control) and video + content variations (e.g., ranging from high to low motion). + + These distinct behavior features can be characterized via simple + statistical modeling or a trace-driven approach. Sections 5 and 6 + provide an example of each approach, respectively. Section 7 + discusses how both models can be combined together. + +4. Interactions between Synthetic Video Traffic Source and Other + Components at the Sender + + Figure 1 depicts the interactions of the synthetic video traffic + source with other components at the sender, such as the application, + the congestion control module, the media packet transport module, + etc. Both reference models, as described later in Sections 5 and 6, + follow the same set of interactions. + + The synthetic video source dynamically generates a sequence of dummy + video frames with varying size and interval. These dummy frames are + processed by other modules in order to transmit the video stream over + the network. During the lifetime of a video transmission session, + the synthetic video source will typically be required to adapt its + encoding bitrate and sometimes the spatial resolution and frame rate. + + In this model, the synthetic video source module has a group of + incoming and outgoing interface calls that allow for interaction with + other modules. The following are some of the possible incoming + interface calls, marked as (a) in Figure 1, that the synthetic video + traffic source may accept. The list is not exhaustive and can be + complemented by other interface calls if necessary. + + o Target bitrate R_v: Target bitrate request measured in bits per + second (bps). Typically, the congestion control module calculates + the target bitrate and updates it dynamically over time. + Depending on the congestion control algorithm in use, the update + requests can either be periodic (e.g., once per second), or + on-demand (e.g., only when a drastic bandwidth change over the + network is observed). + + + + +Zhu, et al. Informational [Page 5] + +RFC 8593 Video Traffic Models for RTP May 2019 + + + o Target frame rate FPS: The instantaneous frame rate measured in + frames per second at a given time. This depends on the native + camera-capture frame rate as well as the target/preferred frame + rate configured by the application or user. + + o Target frame resolution XY: The 2-dimensional vector indicating + the preferred frame resolution in pixels. Several factors govern + the resolution requested to the synthetic video source over time. + Examples of such factors include the capturing resolution of the + native camera and the display size of the destination screen. The + target frame resolution also depends on the current target bitrate + R_v, since it does not make sense to pair very low spatial + resolutions with very high bitrates, and vice-versa. + + o Instant frame skipping: The request to skip the encoding of one or + several captured video frames, for instance, when a drastic + decrease in available network bandwidth is detected. + + o On-demand generation of intra (I) frame: The request to encode + another I-frame to avoid further error propagation at the receiver + when severe packet losses are observed. This request typically + comes from the error control module. It can be initiated either + by the sender or by the receiver via Full Intra Request (FIR) + messages as defined in [RFC5104]. + + An example of an outgoing interface call, marked as (b) in Figure 1, + is the rate range [R_min, R_max]. Here, R_min and R_max are meant to + capture the dynamic rate range the actual live video encoder is + capable of generating given the input video content. This typically + depends on the video content complexity and/or display type (e.g., + higher R_max for video content with higher motion complexity or for + displays of higher resolution). Therefore, these values will not + change with R_v but may change over time if the content is changing. + + + + + + + + + + + + + + + + + + +Zhu, et al. Informational [Page 6] + +RFC 8593 Video Traffic Models for RTP May 2019 + + + +-------------+ + | | dummy encoded + | Synthetic | video frames + | Video | --------------> + | Source | + | | + +--------+----+ + /|\ | + | | + -------------------+ +--------------------> + interface from interface to + other modules (a) other modules (b) + + Figure 1: Interaction between Synthetic Video Encoder + and Other Modules at the Sender + +5. A Statistical Reference Model + + This section describes one simple statistical model of the live video + traffic source. Figure 2 summarizes the list of tunable parameters + in this statistical model. A more comprehensive survey of popular + methods for modeling the behavior of video traffic sources can be + found in [Tanwir2013]. + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Zhu, et al. Informational [Page 7] + +RFC 8593 Video Traffic Models for RTP May 2019 + + + +===========+====================================+================+ + | Notation | Parameter Name | Example Value | + +===========+====================================+================+ + | R_v | Target bitrate request | 1 Mbps | + +-----------+------------------------------------+----------------+ + | FPS | Target frame rate | 30 Hz | + +-----------+------------------------------------+----------------+ + | tau_v | Encoder reaction latency | 0.2 s | + +-----------+------------------------------------+----------------+ + | K_d | Burst duration of the transient | 8 frames | + | | period | | + +-----------+------------------------------------+----------------+ + | K_B | Burst frame size during the | 13.5 KB* | + | | transient period | | + +-----------+------------------------------------+----------------+ + | t0 | Reference frame interval 1/FPS | 33 ms | + +-----------+------------------------------------+----------------+ + | B0 | Reference frame size R_v/8/FPS | 4.17 KB | + +-----------+------------------------------------+----------------+ + | | Scaling parameter of the zero-mean | | + | | Laplacian distribution describing | | + | SCALE_t | deviations in normalized frame | 0.15 | + | | interval (t-t0)/t0 | | + +-----------+------------------------------------+----------------+ + | | Scaling parameter of the zero-mean | | + | | Laplacian distribution describing | | + | SCALE_B | deviations in normalized frame | 0.15 | + | | size (B-B0)/B0 | | + +-----------+------------------------------------+----------------+ + | R_min | Minimum rate supported by video | 150 kbps | + | | encoder type or content activity | | + +-----------+------------------------------------+----------------+ + | R_max | Maximum rate supported by video | 1.5 Mbps | + | | encoder type or content activity | | + +===========+====================================+================+ + + * Example value of K_B for a video stream encoded at 720p and + 30 frames per second using H.264/AVC encoder + + Figure 2: List of Tunable Parameters in a Statistical Video Traffic + Source Model + + + + + + + + + + +Zhu, et al. Informational [Page 8] + +RFC 8593 Video Traffic Models for RTP May 2019 + + +5.1. Time-Damped Response to Target-Rate Update + + While the congestion control module can update its target bitrate + request R_v at any time, the statistical model dictates that the + encoder will only react to such changes tau_v seconds after a + previous rate transition. In other words, when the encoder has + reacted to a rate-change request at time t, it will simply ignore all + subsequent rate-change requests until time t+tau_v. + +5.2. Temporary Burst and Oscillation during the Transient Period + + The output bitrate R_o during the period [t, t+tau_v] is considered + to be in a transient state when reacting to abrupt changes in target + rate. Based on observations from video encoder output, the encoder + reaction to a new target bitrate request can be characterized by high + variations in output frame sizes. It is assumed in the model that + the overall average output bitrate R_o during this transient period + matches the target bitrate R_v. Consequently, the occasional burst + of large frames is followed by smaller-than-average encoded frames. + + This temporary burst is characterized by two parameters: + + o burst duration K_d: Number of frames in the burst event, and + + o burst frame size K_B: Size of the initial burst frame, which is + typically significantly larger than the average frame size at + steady state. + + It can be noted that these burst parameters can also be used to mimic + the insertion of a large on-demand I-frame in the presence of severe + packet losses. The values of K_d and K_B typically depend on the + type of video codec, spatial and temporal resolution of the encoded + stream, as well as the activity level in the video content. + +5.3. Output-Rate Fluctuation at Steady State + + The output bitrate R_o during steady state is modeled as randomly + fluctuating around the target bitrate R_v. The output traffic can be + characterized as the combination of two random processes that denote + the frame interval t and output frame size B over time, which are the + two major sources of variations in the encoder output. For + simplicity, the deviations of t and B from their respective reference + levels are modeled as independent and identically distributed (i.i.d) + random variables following the Laplacian distribution [Papoulis]. + More specifically: + + + + + + +Zhu, et al. Informational [Page 9] + +RFC 8593 Video Traffic Models for RTP May 2019 + + + o Fluctuations in frame interval: The intervals between adjacent + frames have been observed to fluctuate around the reference + interval of t0 = 1/FPS. Deviations in normalized frame interval + DELTA_t = (t-t0)/t0 can be modeled by a zero-mean Laplacian + distribution with scaling parameter SCALE_t. The value of SCALE_t + dictates the "width" of the Laplacian distribution and therefore + the amount of fluctuation in actual frame intervals (t) with + respect to the reference frame interval t0. + + o Fluctuations in frame size: The output-encoded frame sizes also + tend to fluctuate around the reference frame size B0=R_v/8/FPS. + Likewise, deviations in the normalized frame size DELTA_B = + (B-B0)/B0 can be modeled by a zero-mean Laplacian distribution + with scaling parameter SCALE_B. The value of SCALE_B dictates the + "width" of this second Laplacian distribution and correspondingly + the amount of fluctuations in output frame sizes (B) with respect + to the reference target B0. + + Both values of SCALE_t and SCALE_B can be obtained via parameter + fitting from empirical data captured for a given video encoder. + Example values are listed in Figure 2 based on empirical data + presented in [IETF-Interim]. + +5.4. Rate Range Limit Imposed by Video Content + + The output bitrate R_o is further clipped within the dynamic range + [R_min, R_max], which in reality are dictated by scene and motion + complexity of the captured video content. In the proposed + statistical model, these parameters are specified by the application. + +6. A Trace-Driven Model + + The second approach for modeling a video traffic source is trace- + driven. This can be achieved by running an actual live video encoder + on a set of chosen raw video sequences and using the encoder's output + traces for constructing a synthetic video source. With this + approach, the recorded video traces naturally exhibit temporal + fluctuations around a given target bitrate request R_v from the + congestion control module. + + The following list summarizes the main steps of this approach: + + 1. Choose one or more representative raw video sequences. + + 2. Encode the sequence(s) using an actual live video encoder. + Repeat the process for a number of bitrates. Keep only the + sequence of frame sizes for each bitrate. + + + + +Zhu, et al. Informational [Page 10] + +RFC 8593 Video Traffic Models for RTP May 2019 + + + 3. Construct a data structure that contains the output of the + previous step. The data structure should allow for easy bitrate + lookup. + + 4. Upon a target bitrate request R_v from the controller, look up + the closest bitrates among those previously stored. Use the + frame-size sequences stored for those bitrates to approximate the + frame sizes to output. + + 5. The output of the synthetic video traffic source contains + "encoded" frames with dummy contents but with realistic sizes. + + Section 6.1 explains the first three steps (1-3), and Section 6.2 + elaborates on the remaining two steps (4-5). Finally, Section 6.3 + briefly discusses the possibility to extend the trace-driven model + for supporting time-varying frame rate and/or time-varying frame + resolution. + +6.1. Choosing the Video Sequence and Generating the Traces + + The first step is a careful choice of a set of video sequences that + are representative of the target use cases for the video traffic + model. For the example use case of interactive video conferencing, + it is recommended to choose a sequence with content that resembles a + "talking head", e.g., from a news broadcast or recording of an actual + video conferencing call. + + The length of the chosen video sequence is a tradeoff. If it is too + long, it will be difficult to manage the data structures containing + the traces. If it is too short, there will be an obvious periodic + pattern in the output frame sizes, leading to biased results when + evaluating congestion control performance. It has been empirically + determined that a sequence 2 to 4 minutes in length sufficiently + avoids the periodic pattern. + + Given the chosen raw video sequence, denoted "S", one can use a live + encoder, e.g., some implementation of [H264] or [H265], to produce a + set of encoded sequences. As discussed in Section 3, the output + bitrate of the live encoder can be achieved by tuning three input + parameters: quantization step size, frame rate, and picture + resolution. In order to simplify the choice of these parameters for + a given target rate, one can typically assume a fixed frame rate + (e.g., 30 fps) and a fixed resolution (e.g., 720p) when configuring + the live encoder. See Section 6.3 for a discussion on how to relax + these assumptions. + + + + + + +Zhu, et al. Informational [Page 11] + +RFC 8593 Video Traffic Models for RTP May 2019 + + + Following these simplifications, the chosen encoder can be configured + to start at a constant target bitrate, then vary the quantization + step size (internally via the video encoder rate controller) to meet + various externally specified target rates. It can be further assumed + the first frame is encoded as an I-frame and the rest are P-frames + (see, e.g., [H264] for definitions of I-frames and P-frames). For + live encoding, the encoder rate-control algorithm typically does not + use knowledge of frames in the future when encoding a given frame. + + Given the minimum and maximum bitrates at which the synthetic codec + is to operate (denoted as "R_min" and "R_max", see Section 4), the + entire range of target bitrates can be divided into n_s steps. This + leads to an encoding bitrate ladder of (n_s + 1) choices equally + spaced apart by the step length l = (R_max - R_min)/n_s. The + following simple algorithm is used to encode the raw video sequence. + + r = R_min + while r <= R_max do + Traces[r] = encode_sequence(S, r, e) + r = r + l + + The function encode_sequence takes as input parameters, respectively, + a raw video sequence (S), a constant target rate (r), and an encoder + rate-control algorithm (e); it returns a vector with the sizes of + frames in the order they were encoded. The output vector is stored + in a map structure called "Traces", whose keys are bitrates and whose + values are vectors of frame sizes. + + The choice of a value for the number of bitrate steps n_s is + important, since it determines the number of vectors of frame sizes + stored in the map Traces. The minimum value one can choose for n_s + is 1; the maximum value depends on the amount of memory available for + holding the map Traces. A reasonable value for n_s is one that + results in steps of length l = 200 kbps. Section 6.2.2 will discuss + further the choice of step length l. + + Finally, note that, as mentioned in previous sections, R_min and + R_max may be modified after the initial sequences are encoded. + Henceforth, for notational clarity, we refer to the bitrate range of + the trace file as [Rf_min, Rf_max]. The algorithm described in + Section 6.2.1 also covers the cases when the current target bitrate + is less than Rf_min or greater than Rf_max. + + + + + + + + + +Zhu, et al. Informational [Page 12] + +RFC 8593 Video Traffic Models for RTP May 2019 + + +6.2. Using the Traces in the Synthetic Codec + + The main idea behind the trace-driven synthetic codec is that it + mimics the rate-adaptation behavior of a real live codec upon dynamic + updates of the target bitrate request R_v by the congestion control + module. It does so by switching to a different frame-size vector + stored in the map Traces when needed. + +6.2.1. Main Algorithm + + The main algorithm for rate adaptation in the synthetic codec + maintains two variables: r_current and t_current. + + o The variable r_current points to one of the keys of map Traces. + Upon a change in the value of R_v, typically because the + congestion controller detects that the network conditions have + changed, r_current is updated based on R_v as follows: + + R_ref = min (Rf_max, max(Rf_min, R_v)) + + r_current = r + such that + (r in keys(Traces) and + r <= R_ref and + (not(exists) r' in keys(Traces) such that r <r'<= R_ref)) + + o The variable t_current is an index to the frame-size vector stored + in Traces[r_current]. It is updated every time a new frame is + due. It is assumed that all vectors stored in Traces have the + same size, denoted as "size_traces". The following equation + governs the update of t_current: + + if t_current < SkipFrames then + t_current = t_current + 1 + else + t_current = ((t_current + 1 - SkipFrames) + % (size_traces-SkipFrames)) + SkipFrames + + where operator "%" denotes modulo, and SkipFrames is a predefined + constant that denotes the number of frames to be skipped at the + beginning of frame-size vectors after t_current has wrapped around. + The point of constant SkipFrames is avoiding the effect of + periodically sending a large I-frame followed by several smaller- + than-average P-frames. A typical value of SkipFrames is 20, although + it could be set to 0 if one is interested in studying the effect of + sending I-frames periodically. + + + + + +Zhu, et al. Informational [Page 13] + +RFC 8593 Video Traffic Models for RTP May 2019 + + + The initial value of r_current is set to R_min, and the initial value + of t_current is set to 0. + + When a new frame is due, its size can be calculated following one of + the three cases below: + + a) Rf_min <= R_v < Rf_max: The output frame size is calculated via + linear interpolation of the frame sizes appearing in + Traces[r_current] and Traces[r_current + l]. The interpolation is + done as follows: + + size_lo = Traces[r_current][t_current] + size_hi = Traces[r_current + l][t_current] + distance_lo = (R_v - r_current) / l + framesize = size_hi*distance_lo + size_lo*(1-distance_lo) + + b) R_v < Rf_min: The output frame size is calculated via scaling + with respect to the lowest bitrate Rf_min in the trace file, as + follows: + + w = R_v / Rf_min + framesize = max(fs_min, factor * Traces[Rf_min][t_current]) + + c) R_v >= Rf_max: The output frame size is calculated by scaling + with respect to the highest bitrate Rf_max in the trace file, as + follows: + + w = R_v / Rf_max + framesize = min(fs_max, w * Traces[Rf_max][t_current]) + + In cases b) and c), floating-point arithmetic is used for computing + the scaling factor "w". The resulting value of the instantaneous + frame size (framesize) is further clipped within a reasonable range + between fs_min (e.g., 10 bytes) and fs_max (e.g., 1 MB). + +6.2.2. Notes to the Main Algorithm + + Note that the main algorithm as described above can be further + extended to mimic some additional typical behaviors of a live video + encoder. Two examples are given below: + + o I-frames on demand: The synthetic codec can be extended to + simulate the sending of I-frames on demand, e.g., as a reaction to + losses. To implement this extension, the codec's incoming + interface (see (a) in Figure 1) is augmented with a new function + to request a new I-frame. Upon calling such function, t_current + is reset to 0. + + + + +Zhu, et al. Informational [Page 14] + +RFC 8593 Video Traffic Models for RTP May 2019 + + + o Variable step length l between R_min and R_max: In the main + algorithm, the step length l is fixed for ease of explanation. + However, if the range [R_min, R_max] is very wide, it is also + possible to define a set of intermediate encoding rates with + variable step length. The rationale behind this modification is + that the difference between 400 and 600 kbps as target bitrate is + much more significant than the difference between 4400 kbps and + 4600 kbps. For example, one could define steps of length 200 kbps + under 1 Mbps, then steps of length 300 kbps between 1 Mbps and 2 + Mbps, then 400 kbps between 2 Mbps and 3 Mbps, and so on. + +6.3. Varying Frame Rate and Resolution + + The trace-driven synthetic codec model explained in this section is + relatively simple due to the choice of fixed frame rate and frame + resolution. The model can be extended further to accommodate + variable frame rate and/or variable spatial resolution. + + When the encoded picture quality at a given bitrate is low, one can + potentially decrease either the frame rate (if the video sequence is + currently in low motion) or the spatial resolution in order to + improve quality of experience (QoE) in the overall encoded video. On + the other hand, if target bitrate increases to a point where there is + no longer a perceptible improvement in the picture quality of + individual frames, then one might afford to increase the spatial + resolution or the frame rate (useful if the video is currently in + high motion). + + Many techniques have been proposed to choose over time the best + combination of encoder-quantization step size, frame rate, and + spatial resolution in order to maximize the quality of live video + codecs [Ozer2011] [Hu2012]. Future work may consider extending the + trace-driven codec to accommodate variable frame rate and/or + resolution. + + From the perspective of congestion control, varying the spatial + resolution typically requires a new intra-coded frame to be + generated, thereby incurring a temporary burst in the output traffic + pattern. The impact of frame-rate change tends to be more subtle: + reducing frame rate from high to low leads to sparsely spaced larger + encoded packets instead of many densely spaced smaller packets. Such + difference in traffic profiles may still affect the performance of + congestion control, especially when outgoing packets are not paced by + the media transport module. Investigation of varying frame rate and + resolution are left for future work. + + + + + + +Zhu, et al. Informational [Page 15] + +RFC 8593 Video Traffic Models for RTP May 2019 + + +7. Combining the Two Models + + It is worthwhile noting that the statistical and trace-driven models + each have their own advantages and drawbacks. Both models are fairly + simple to implement. It takes significantly greater effort to fit + the parameters of a statistical model to actual encoder output data. + In contrast, it is straightforward for a trace-driven model to obtain + encoded frame-size data. Once validated, the statistical model is + more flexible in mimicking a wide range of encoder/content behaviors + by simply varying the corresponding parameters in the model. In this + regard, a trace-driven model relies, by definition, on additional + data-collection efforts for accommodating new codecs or video + contents. + + In general, the trace-driven model is more realistic for mimicking + the ongoing steady-state behavior of a video traffic source with + fluctuations around a constant target rate. In contrast, the + statistical model is more versatile for simulating the behavior of a + video stream in transient, such as when encountering sudden rate + changes. It is also possible to combine both methods into a hybrid + model. In this case, the steady-state behavior is driven by traces + during steady state and the transient-state behavior is driven by the + statistical model. + + transient +---------------+ + state | Generate next | + +------>| K_d transient | + +-----------------+ / | frames | + R_v | Compare against | / +---------------+ + ------>| previous |/ + | target bitrate |\ + +-----------------+ \ +---------------+ + \ | Generate next | + +------>| frame from | + steady | trace | + state +---------------+ + + Figure 3: A Hybrid Video Traffic Model + + As shown in Figure 3, the video traffic model operates in a transient + state if the requested target rate R_v is substantially different + from the previous target; otherwise, it operates in a steady state. + During the transient state, a total of K_d frames are generated by + the statistical model, resulting in one (1) big burst frame with size + K_B followed by K_d-1 smaller frames. When operating at steady + state, the video traffic model simply generates a frame according to + the trace-driven model given the target rate while modulating the + frame interval according to the distribution specified by the + + + +Zhu, et al. Informational [Page 16] + +RFC 8593 Video Traffic Models for RTP May 2019 + + + statistical model. One example criterion for determining whether the + traffic model should operate in a transient state is whether the rate + change exceeds 10% of the previous target rate. Finally, as this + model follows transient-state behavior dictated by the statistical + model, upon a substantial rate change, the model will follow the + time-damping mechanism as defined in Section 5.1, which is governed + by parameter tau_v. + +8. Reference Implementation + + The statistical, trace-driven, and hybrid models as described in this + document have been implemented as a stand-alone, platform-independent + synthetic traffic source module. It can be easily integrated into + network simulation platforms such as [ns-2] and [ns-3], as well as + testbeds using a real network. The stand-alone traffic source module + is available as an open-source implementation at [Syncodecs]. + +9. IANA Considerations + + This document has no IANA actions. + +10. Security Considerations + + The synthetic video traffic models as described in this document do + not impose any security threats. They are designed to mimic + realistic traffic patterns for evaluating candidate RTP-based + congestion control algorithms so as to ensure stable operations of + the network. It is RECOMMENDED that candidate algorithms be tested + using the video traffic models presented in this document before wide + deployment over the Internet. If the generated synthetic traffic + flows are sent over the Internet, they also need to be congestion + controlled. + +11. References + +11.1. Normative References + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, + DOI 10.17487/RFC2119, March 1997, + <https://www.rfc-editor.org/info/rfc2119>. + + [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC + 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, + May 2017, <https://www.rfc-editor.org/info/rfc8174>. + + + + + + +Zhu, et al. Informational [Page 17] + +RFC 8593 Video Traffic Models for RTP May 2019 + + +11.2. Informative References + + [H264] ITU-T, "Advanced video coding for generic audiovisual + services", Recommendation H.264, April 2017, + <https://www.itu.int/rec/T-REC-H.264>. + + [H265] ITU-T, "High efficiency video coding", + Recommendation H.265, February 2018, + <https://www.itu.int/rec/T-REC-H.265>. + + [Hu2012] Hu, H., Ma, Z., and Y. Wang, "Optimization of Spatial, + Temporal and Amplitude Resolution for Rate-Constrained + Video Coding and Scalable Video Adaptation", Proc. 19th + IEEE International Conference on Image Processing (ICIP), + DOI 10.1109/ICIP.2012.6466960, September 2012. + + [IETF-Interim] + Zhu, X., Mena, S., and Z. Sarker, "Update on RMCAT Video + Traffic Model: Trace Analysis and Model Update", IETF + RMCAT Virtual Interim, April 2017, + <https://www.ietf.org/proceedings/interim-2017-rmcat- + 01/slides/slides-interim-2017-rmcat-01-sessa-update-on- + video-traffic-model-draft-00.pdf>. + + [ns-2] "The Network Simulator - ns-2", December 2015, + <https://nsnam.sourceforge.net/wiki/index.php/ + User_Information>. + + [ns-3] "NS-3 Network Simulator", <https://www.nsnam.org/>. + + [Ozer2011] Ozer, J., "Video Compression for Flash, Apple Devices and + HTML5", Galax: Doceo Publishing, ISBN-13: 978-0976259503, + 2011. + + [Papoulis] Papoulis, A. and S. Pillai, "Probability, Random Variables + and Stochastic Processes", London: McGraw-Hill Europe, + ISBN-13: 978-0071226615, 2002. + + [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, + "Codec Control Messages in the RTP Audio-Visual Profile + with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104, + February 2008, <https://www.rfc-editor.org/info/rfc5104>. + + + + + + + + + +Zhu, et al. Informational [Page 18] + +RFC 8593 Video Traffic Models for RTP May 2019 + + + [Syncodecs] + "Syncodecs: Synthetic codecs for evaluation of RMCAT + work", commit a92d6c8, May 2018, + <https://github.com/cisco/syncodecs>. + + [Tanwir2013] + Tanwir, S. and H. Perros, "A Survey of VBR Video Traffic + Models", IEEE Communications Surveys and Tutorials, Volume + 15, Issue 4, p. 1778-1802, + DOI 10.1109/SURV.2013.010413.00071, January 2013. + +Authors' Addresses + + Xiaoqing Zhu + Cisco Systems + 12515 Research Blvd., Building 4 + Austin, TX 78759 + United States of America + + Email: xiaoqzhu@cisco.com + + + Sergio Mena + Cisco Systems + EPFL, Quartier de l'Innovation, Batiment E + Ecublens, Vaud 1015 + Switzerland + + Email: semena@cisco.com + + + Zaheduzzaman Sarker + Ericsson AB + Torshamnsgatan 23 + Stockholm, SE 164 83 + Sweden + + Phone: +46 10 717 37 43 + Email: zaheduzzaman.sarker@ericsson.com + + + + + + + + + + + + +Zhu, et al. Informational [Page 19] + |