doc/rfc/rfc2429.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955

Network Working Group
Request for Comments: 2429                                    C. Bormann
Category: Standards Track                                   Univ. Bremen
                                                                L. Cline
                                                              G. Deisher
                                                               T. Gardos
                                                             C. Maciocco
                                                               D. Newell
                                                                   Intel
                                                                  J. Ott
                                                            Univ. Bremen
                                                             G. Sullivan
                                                              PictureTel
                                                               S. Wenger
                                                               TU Berlin
                                                                  C. Zhu
                                                                   Intel
                                                            October 1998


               RTP Payload Format for the 1998 Version of
                    ITU-T Rec. H.263 Video (H.263+)

Status of this Memo

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (1998).  All Rights Reserved.

1. Introduction

   This document specifies an RTP payload header format applicable to
   the transmission of video streams generated based on the 1998 version
   of ITU-T Recommendation H.263 [4].  Because the 1998 version of H.263
   is a superset of the 1996 syntax, this format can also be used with
   the 1996 version of H.263 [3], and is recommended for this use by new
   implementations.  This format does not replace RFC 2190, which
   continues to be used by existing implementations, and may be required
   for backward compatibility in new implementations.  Implementations
   using the new features of the 1998 version of H.263 shall use the
   format described in this document.


Bormann, et. al.            Standards Track                     [Page 1]
^L
RFC 2429                         H.263+                     October 1998


   The 1998 version of ITU-T Recommendation H.263 added numerous coding
   options to improve codec performance over the 1996 version.  The 1998
   version is referred to as H.263+ in this document.  Among the new
   options, the ones with the biggest impact on the RTP payload
   specification and the error resilience of the video content are the
   slice structured mode, the independent segment decoding mode, the
   reference picture selection mode, and the scalability mode.  This
   section summarizes the impact of these new coding options on
   packetization.  Refer to [4] for more information on coding options.

   The slice structured mode was added to H.263+ for three purposes: to
   provide enhanced error resilience capability, to make the bitstream
   more amenable to use with an underlying packet transport such as RTP,
   and to minimize video delay.  The slice structured mode supports
   fragmentation at macroblock boundaries.

   With the independent segment decoding (ISD) option, a video picture
   frame is broken into segments and encoded in such a way that each
   segment is independently decodable.  Utilizing ISD in a lossy network
   environment helps to prevent the propagation of errors from one
   segment of the picture to others.

   The reference picture selection mode allows the use of an older
   reference picture rather than the one immediately preceding the
   current picture.  Usually, the last transmitted frame is implicitly
   used as the reference picture for inter-frame prediction.  If the
   reference picture selection mode is used, the data stream carries
   information on what reference frame should be used, indicated by the
   temporal reference as an ID for that reference frame.  The reference
   picture selection mode can be used with or without a back channel,
   which provides information to the encoder about the internal status
   of the decoder.  However, no special provision is made herein for
   carrying back channel information.

   H.263+ also includes bitstream scalability as an optional coding
   mode.  Three kinds of scalability are defined: temporal, signal-to-
   noise ratio (SNR), and spatial scalability.  Temporal scalability is
   achieved via the disposable nature of bi-directionally predicted
   frames, or B-frames. (A low-delay form of temporal scalability known
   as P-picture temporal scalability can also be achieved by using the
   reference picture selection mode described in the previous
   paragraph.)  SNR scalability permits refinement of encoded video
   frames, thereby improving the quality (or SNR).  Spatial scalability
   is similar to SNR scalability except the refinement layer is twice
   the size of the base layer in the horizontal dimension, vertical
   dimension, or both.


Bormann, et. al.            Standards Track                     [Page 2]
^L
RFC 2429                         H.263+                     October 1998


2. Usage of RTP

   When transmitting H.263+ video streams over the Internet, the output
   of the encoder can be packetized directly.  All the bits resulting
   from the bitstream including the fixed length codes and variable
   length codes will be included in the packet, with the only exception
   being that when the payload of a packet begins with a Picture, GOB,
   Slice, EOS, or EOSBS start code, the first two (all-zero) bytes of
   the start code are removed and replaced by setting an indicator bit
   in the payload header.

   For H.263+ bitstreams coded with temporal, spatial, or SNR
   scalability, each layer may be transported to a different network
   address.  More specifically, each layer may use a unique IP address
   and port number combination.  The temporal relations between layers
   shall be expressed using the RTP timestamp so that they can be
   synchronized at the receiving ends in multicast or unicast
   applications.

   The H.263+ video stream will be carried as payload data within RTP
   packets.  A new H.263+ payload header is defined in section 4.  This
   section defines the usage of the RTP fixed header and H.263+ video
   packet structure.

2.1 RTP Header Usage

   Each RTP packet starts with a fixed RTP header.  The following fields
   of the RTP fixed header are used for H.263+ video streams:

   Marker bit (M bit): The Marker bit of the RTP header is set to 1 when
   the current packet carries the end of current frame, and is 0
   otherwise.

   Payload Type (PT): The Payload Type shall specify the H.263+ video
   payload format.

   Timestamp: The RTP Timestamp encodes the sampling instance of the
   first video frame data contained in the RTP data packet.  The RTP
   timestamp shall be the same on successive packets if a video frame
   occupies more than one packet.  In a multilayer scenario, all
   pictures corresponding to the same temporal reference should use the
   same timestamp.  If temporal scalability is used (if B-frames are
   present), the timestamp may not be monotonically increasing in the
   RTP stream.  If B-frames are transmitted on a separate layer and
   address, they must be synchronized properly with the reference
   frames.  Refer to the 1998 ITU-T Recommendation H.263 [4] for
   information on required transmission order to a decoder.  For an
   H.263+ video stream, the RTP timestamp is based on a 90 kHz clock,


Bormann, et. al.            Standards Track                     [Page 3]
^L
RFC 2429                         H.263+                     October 1998


   the same as that of the RTP payload for H.261 stream [5].  Since both
   the H.263+ data and the RTP header contain time information, it is
   required that those timing information run synchronously.  That is,
   both the RTP timestamp and the temporal reference (TR in the picture
   header of H.263) should carry the same relative timing information.
   Any H.263+ picture clock frequency can be expressed as
   1800000/(cd*cf) source pictures per second, in which cd is an integer
   from 1 to 127 and cf is either 1000 or 1001.  Using the 90 kHz clock
   of the RTP timestamp, the time increment between each coded H.263+
   picture should therefore be a integer multiple of (cd*cf)/20. This
   will always be an integer for any "reasonable" picture clock
   frequency (for example, it is 3003 for 29.97 Hz NTSC, 3600 for 25 Hz
   PAL, 3750 for 24 Hz film, and 1500, 1250 and 1200 for the computer
   display update rates of 60, 72 and 75 Hz, respectively).  For RTP
   packetization of hypothetical H.263+ bitstreams using "unreasonable"
   custom picture clock frequencies, mathematical rounding could become
   necessary for generating the RTP timestamps.

2.2 Video Packet Structure

   A section of an H.263+ compressed bitstream is carried as a payload
   within each RTP packet.  For each RTP packet, the RTP header is
   followed by an H.263+ payload header, which is followed by a number
   of bytes of a standard H.263+ compressed bitstream.  The size of the
   H.263+ payload header is variable depending on the payload involved
   as detailed in the section 4.  The layout of the RTP H.263+ video
   packet is shown as:

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |    RTP Header                                               ...
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |    H.263+ Payload Header                                    ...
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |    H.263+ Compressed Data Stream                            ...
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Any H.263+ start codes can be byte aligned by an encoder by using the
   stuffing mechanisms of H.263+.  As specified in H.263+, picture,
   slice, and EOSBS starts codes shall always be byte aligned, and GOB
   and EOS start codes may be byte aligned.  For packetization purposes,
   GOB start codes should be byte aligned; however, since this is not
   required in H.263+, there may be some cases where GOB start codes are
   not aligned, such as when transmitting existing content, or when
   using H.263 encoders that do not support GOB start code alignment.
   In this case, follow-on packets (see section 5.2) should be used for
   packetization.


Bormann, et. al.            Standards Track                     [Page 4]
^L
RFC 2429                         H.263+                     October 1998


   All H.263+ start codes (Picture, GOB, Slice, EOS, and EOSBS) begin
   with 16 zero-valued bits.  If a start code is byte aligned and it
   occurs at the beginning of a packet, these two bytes shall be removed
   from the H.263+ compressed data stream in the packetization process
   and shall instead be represented by setting a bit (the P bit) in the
   payload header.

3. Design Considerations

   The goals of this payload format are to specify an efficient way of
   encapsulating an H.263+ standard compliant bitstream and to enhance
   the resiliency towards packet losses.  Due to the large number of
   different possible coding schemes in H.263+, a copy of the picture
   header with configuration information is inserted into the payload
   header when appropriate.  The use of that copy of the picture header
   along with the payload data can allow decoding of a received packet
   even in such cases in which another packet containing the original
   picture header becomes lost.

   There are a few assumptions and constraints associated with this
   H.263+ payload header design.  The purpose of this section is to
   point out various design issues and also to discuss several coding
   options provided by H.263+ that may impact the performance of
   network-based H.263+ video.

   o The optional slice structured mode described in Annex K of H.263+
     [4] enables more flexibility for packetization.  Similar to a
     picture segment that begins with a GOB header, the motion vector
     predictors in a slice are restricted to reside within its
     boundaries.  However, slices provide much greater freedom in the
     selection of the size and shape of the area which is represented as
     a distinct decodable region. In particular, slices can have a size
     which is dynamically selected to allow the data for each slice to
     fit into a chosen packet size. Slices can also be chosen to have a
     rectangular shape which is conducive for minimizing the impact of
     errors and packet losses on motion compensated prediction.  For
     these reasons, the use of the slice structured mode is strongly
     recommended for any applications used in environments where
     significant packet loss occurs.

   o In non-rectangular slice structured mode, only complete slices
     should be included in a packet.  In other words, slices should not
     be fragmented across packet boundaries.  The only reasonable need
     for a slice to be fragmented across packet boundaries is when the
     encoder which generated the H.263+ data stream could not be
     influenced by an awareness of the packetization process (such as
     when sending H.263+ data through a network other than the one to
     which the encoder is attached, as in network gateway


Bormann, et. al.            Standards Track                     [Page 5]
^L
RFC 2429                         H.263+                     October 1998


     implementations).  Optimally, each packet will contain only one
     slice.

   o The independent segment decoding (ISD) described in Annex R of [4]
     prevents any data dependency across slice or GOB boundaries in the
     reference picture.  It can be utilized to further improve
     resiliency in high loss conditions.

   o If ISD is used in conjunction with the slice structure, the
     rectangular slice submode shall be enabled and the dimensions and
     quantity of the slices present in a frame shall remain the same
     between each two intra-coded frames (I-frames), as required in
     H.263+. The individual ISD segments may also be entirely intra
     coded from time to time to realize quick error recovery without
     adding the latency time associated with sending complete INTRA-
     pictures.

   o When the slice structure is not applied, the insertion of a
     (preferably byte-aligned) GOB header can be used to provide resync
     boundaries in the bitstream, as the presence of a GOB header
     eliminates the dependency of motion vector prediction across GOB
     boundaries.  These resync boundaries provide natural locations for
     packet payload boundaries.

   o H.263+ allows picture headers to be sent in an abbreviated form in
     order to prevent repetition of overhead information that does not
     change from picture to picture.  For resiliency, sending a complete
     picture header for every frame is often advisable.  This means that
     (especially in cases with high packet loss probability in which
     picture header contents are not expected to be highly predictable),
     the sender may find it advisable to always set the subfield UFEP in
     PLUSPTYPE to '001' in the H.263+ video bitstream.  (See [4] for the
     definition of the UFEP and PLUSPTYPE fields).

   o In a multi-layer scenario, each layer may be transmitted to a
     different network address.  The configuration of each layer such as
     the enhancement layer number (ELNUM), reference layer number
     (RLNUM), and scalability type should be determined at the start of
     the session and should not change during the course of the session.

   o All start codes can be byte aligned, and picture, slice, and EOSBS
     start codes are always byte aligned.  The boundaries of these
     syntactical elements provide ideal locations for placing packet
     boundaries.


Bormann, et. al.            Standards Track                     [Page 6]
^L
RFC 2429                         H.263+                     October 1998


   o We assume that a maximum Picture Header size of 504 bits is
     sufficient.  The syntax of H.263+ does not explicitly prohibit
     larger picture header sizes, but the use of such extremely large
     picture headers is not expected.

4. H.263+ Payload Header

   For H.263+ video streams, each RTP packet carries only one H.263+
   video packet.  The H.263+ payload header is always present for each
   H.263+ video packet.  The payload header is of variable length.  A 16
   bit field of the basic payload header may be followed by an 8 bit
   field for Video Redundancy Coding (VRC) information, and/or by a
   variable length extra picture header as indicated by PLEN. These
   optional fields appear in the order given above when present.

   If an extra picture header is included in the payload header, the
   length of the picture header in number of bytes is specified by PLEN.
   The minimum length of the payload header is 16 bits, corresponding to
   PLEN equal to 0 and no VRC information present.

   The remainder of this section defines the various components of the
   RTP payload header.  Section five defines the various packet types
   that are used to carry different types of H.263+ coded data, and
   section six summarizes how to distinguish between the various packet
   types.

4.1 General H.263+ payload header

   The H.263+ payload header is structured as follows:

      0                   1
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |   RR    |P|V|   PLEN    |PEBIT|
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   RR: 5 bits
     Reserved bits.  Shall be zero.

   P: 1 bit
     Indicates the picture start or a picture segment (GOB/Slice) start
     or a video sequence end (EOS or EOSBS).  Two bytes of zero bits
     then have to be prefixed to the payload of such a packet to compose
     a complete picture/GOB/slice/EOS/EOSBS start code.  This bit allows
     the omission of the two first bytes of the start codes, thus
     improving the compression ratio.


Bormann, et. al.            Standards Track                     [Page 7]
^L
RFC 2429                         H.263+                     October 1998


   V: 1 bit
     Indicates the presence of an 8 bit field containing information for
     Video Redundancy Coding (VRC), which follows immediately after the
     initial 16 bits of the payload header if present.  For syntax and
     semantics of that 8 bit VRC field see section 4.2.

   PLEN: 6 bits
     Length in bytes of the extra picture header.  If no extra picture
     header is attached, PLEN is 0.  If PLEN>0, the extra picture header
     is attached immediately following the rest of the payload header.
     Note the length reflects the omission of the first two bytes of the
     picture start code (PSC).  See section 5.1.

   PEBIT: 3 bits
     Indicates the number of bits that shall be ignored in the last byte
     of the picture header.  If PLEN is not zero, the ignored bits shall
     be the least significant bits of the byte.  If PLEN is zero, then
     PEBIT shall also be zero.

4.2 Video Redundancy Coding Header Extension

   Video Redundancy Coding (VRC) is an optional mechanism intended to
   improve error resilience over packet networks.  Implementing VRC in
   H.263+ will require the Reference Picture Selection option described
   in Annex N of [4].  By having multiple "threads" of independently
   inter-frame predicted pictures, damage of individual frame will cause
   distortions only within its own thread but leave the other threads
   unaffected.  From time to time, all threads converge to a so-called
   sync frame (an INTRA picture or a non-INTRA picture which is
   redundantly represented within multiple threads); from this sync
   frame, the independent threads are started again.  For more
   information on codec support for VRC see [7].

   P-picture temporal scalability is another use of the reference
   picture selection mode and can be considered a special case of VRC in
   which only one copy of each sync frame may be sent.  It offers a
   thread-based method of temporal scalability without the increased
   delay caused by the use of B pictures.  In this use, sync frames sent
   in the first thread of pictures are also used for the prediction of a
   second thread of pictures which fall temporally between the sync
   frames to increase the resulting frame rate.  In this use, the
   pictures in the second thread can be discarded in order to obtain a
   reduction of bit rate or decoding complexity without harming the
   ability to decode later pictures.  A third or more threads can also
   be added as well, but each thread is predicted only from the sync
   frames (which are sent at least in thread 0) or from frames within
   the same thread.


Bormann, et. al.            Standards Track                     [Page 8]
^L
RFC 2429                         H.263+                     October 1998


   While a VRC data stream is - like all H.263+ data - totally self-
   contained, it may be useful for the transport hierarchy
   implementation to have knowledge about the current damage status of
   each thread.  On the Internet, this status can easily be determined
   by observing the marker bit, the sequence number of the RTP header,
   and the thread-id and a circling "packet per thread" number.  The
   latter two numbers are coded in the VRC header extension.

   The format of the VRC header extension is as follows:

      0 1 2 3 4 5 6 7
     +-+-+-+-+-+-+-+-+
     | TID | Trun  |S|
     +-+-+-+-+-+-+-+-+

   TID: 3 bits
     Thread ID.  Up to 7 threads are allowed. Each frame of H.263+ VRC
     data will use as reference information only sync frames or frames
     within the same thread.  By convention, thread 0 is expected to be
     the "canonical" thread, which is the thread from which the sync
     frame should ideally be used.  In the case of corruption or loss of
     the thread 0 representation, a representation of the sync frame
     with a higher thread number can be used by the decoder.  Lower
     thread numbers are expected to contain equal or better
     representations of the sync frames than higher thread numbers in
     the absence of data corruption or loss.  See [7] for a detailed
     discussion of VRC.

   Trun: 4 bits
     Monotonically increasing (modulo 16) 4 bit number counting the
     packet number within each thread.

   S: 1 bit
     A bit that indicates that the packet content is for a sync frame.
     An encoder using VRC may send several representations of the same
     "sync" picture, in order to ensure that regardless of which thread
     of pictures is corrupted by errors or packet losses, the reception
     of at least one representation of a particular picture is ensured
     (within at least one thread).  The sync picture can then be used
     for the prediction of any thread.  If packet losses have not
     occurred, then the sync frame contents of thread 0 can be used and
     those of other threads can be discarded (and similarly for other
     threads).  Thread 0 is considered the "canonical" thread, the use
     of which is preferable to all others.  The contents of packets
     having lower thread numbers shall be considered as having a higher
     processing and delivery priority than those with higher thread
     numbers.  Thus packets having lower thread numbers for a given sync
     frame shall be delivered first to the decoder under loss-free and


Bormann, et. al.            Standards Track                     [Page 9]
^L
RFC 2429                         H.263+                     October 1998


     low-time-jitter conditions, which will result in the discarding of
     the sync contents of the higher-numbered threads as specified in
     Annex N of [4].

5. Packetization schemes

5.1 Picture Segment Packets and Sequence Ending Packets (P=1)

   A picture segment packet is defined as a packet that starts at the
   location of a Picture, GOB, or slice start code in the H.263+ data
   stream.  This corresponds to the definition of the start of a video
   picture segment as defined in H.263+.  For such packets, P=1 always.

   An extra picture header can sometimes be attached in the payload
   header of such packets.  Whenever an extra picture header is attached
   as signified by PLEN>0, only the last six bits of its picture start
   code, '100000', are included in the payload header.  A complete
   H.263+ picture header with byte aligned picture start code can be
   conveniently assembled on the receiving end by prepending the sixteen
   leading '0' bits.

   When PLEN>0, the end bit position corresponding to the last byte of
   the picture header data is indicated by PEBIT.  The actual bitstream
   data shall begin on an 8-bit byte boundary following the payload
   header.

   A sequence ending packet is defined as a packet that starts at the
   location of an EOS or EOSBS code in the H.263+ data stream.  This
   delineates the end of a sequence of H.263+ video data (more H.263+
   video data may still follow later, however, as specified in ITU-T
   Recommendation H.263).  For such packets, P=1 and PLEN=0 always.

   The optional header extension for VRC may or may not be present as
   indicated by the V bit flag.

5.1.1 Packets that begin with a Picture Start Code

   Any packet that contains the whole or the start of a coded picture
   shall start at the location of the picture start code (PSC), and
   should normally be encapsulated with no extra copy of the picture
   header. In other words, normally PLEN=0 in such a case.   However, if
   the coded picture contains an incomplete picture header (UFEP =
   "000"), then a representation of the complete (UFEP = "001") picture
   header may be attached during packetization in order to provide
   greater error resilience.  Thus, for packets that start at the
   location of a picture start code, PLEN shall be zero unless both of
   the following conditions apply:


Bormann, et. al.            Standards Track                    [Page 10]
^L
RFC 2429                         H.263+                     October 1998


   1) The picture header in the H.263+ bitstream payload is incomplete
      (PLUSPTYPE present and UFEP="000"), and

   2) The additional picture header which is attached is not incomplete
      (UFEP="001").

   A packet which begins at the location of a Picture, GOB, slice, EOS,
   or EOSBS start code shall omit the first two (all zero) bytes from
   the H.263+ bitstream, and signify their presence by setting P=1 in
   the payload header.

   Here is an example of encapsulating the first packet in a frame
   (without an attached redundant complete picture header):

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |   RR    |1|V|0|0|0|0|0|0|0|0|0| bitstream data without the    |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | first two 0 bytes of the PSC                                ...
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

5.1.2 Packets that begin with GBSC or SSC

   For a packet that begins at the location of a GOB or slice start
   code, PLEN may be zero or may be nonzero, depending on whether a
   redundant picture header is attached to the packet.  In environments
   with very low packet loss rates, or when picture header contents are
   very seldom likely to change (except as can be detected from the GFID
   syntax of H.263+), a redundant copy of the picture header is not
   required. However, in less ideal circumstances a redundant picture
   header should be attached for enhanced error resilience, and its
   presence is indicated by PLEN>0.

   Assuming a PLEN of 9 and P=1, below is an example of a packet that
   begins with a byte aligned GBSC or a SSC:

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |   RR    |1|V|0 0 1 0 0 1|PEBIT|1 0 0 0 0 0| picture header    |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | starting with TR, PTYPE ...                                   |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | ...                                           | bitstream     |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | data starting with GBSC/SSC without its first two 0 bytes   ...
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


Bormann, et. al.            Standards Track                    [Page 11]
^L
RFC 2429                         H.263+                     October 1998


   Notice that only the last six bits of the picture start code,
   '100000', are included in the payload header.  A complete H.263+
   picture header with byte aligned picture start code can be
   conveniently assembled if needed on the receiving end by prepending
   the sixteen leading '0' bits.

5.1.3 Packets that Begin with an EOS or EOSBS Code

   For a packet that begins with an EOS or EOSBS code, PLEN shall be
   zero, and no Picture, GOB, or Slice start codes shall be included
   within the same packet.  As with other packets beginning with start
   codes, the two all-zero bytes that begin the EOS or EOSBS code at the
   beginning of the packet shall be omitted, and their presence shall be
   indicated by setting the P bit to 1 in the payload header.

   System designers should be aware that some decoders may interpret the
   loss of a packet containing only EOS or EOSBS information as the loss
   of essential video data and may thus respond by not displaying some
   subsequent video information.  Since EOS and EOSBS codes do not
   actually affect the decoding of video pictures, they are somewhat
   unnecessary to send at all.  Because of the danger of
   misinterpretation of the loss of such a packet (which can be detected
   by the sequence number), encoders are generally to be discouraged
   from sending EOS and EOSBS.

   Below is an example of a packet containing an EOS code:

      0                   1                   2
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |   RR    |1|V|0|0|0|0|0|0|0|0|0|1|1|1|1|1|1|0|0|
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   5.2 Encapsulating Follow-On Packet (P=0)

   A Follow-on packet contains a number of bytes of coded H.263+ data
   which does not start at a synchronization point.  That is, a Follow-
   On packet does not start with a Picture, GOB, Slice, EOS, or EOSBS
   header, and it may or may not start at a macroblock boundary.  Since
   Follow-on packets do not start at synchronization points, the data at
   the beginning of a follow-on packet is not independently decodable.
   For such packets, P=0 always.  If the preceding packet of a Follow-on
   packet got lost, the receiver may discard that Follow-on packet as
   well as all other following Follow-on packets.  Better behavior, of
   course, would be for the receiver to scan the interior of the packet
   payload content to determine whether any start codes are found in the
   interior of the packet which can be used as resync points.  The use
   of an attached copy of a picture header for a follow-on packet is


Bormann, et. al.            Standards Track                    [Page 12]
^L
RFC 2429                         H.263+                     October 1998


   useful only if the interior of the packet or some subsequent follow-
   on packet contains a resync code such as a GOB or slice start code.
   PLEN>0 is allowed, since it may allow resync in the interior of the
   packet.  The decoder may also be resynchronized at the next segment
   or picture packet.

   Here is an example of a follow-on packet (with PLEN=0):

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |   RR    |0|V|0|0|0|0|0|0|0|0|0| bitstream data              ...
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

6. Use of this payload specification

   There is no syntactical difference between a picture segment packet and
   a Follow-on packet, other than the indication P=1 for picture segment or
   sequence ending packets and P=0 for Follow-on packets.  See the
   following for a summary of the entire packet types and ways to
   distinguish between them.

   It is possible to distinguish between the different packet types by
   checking the P bit and the first 6 bits of the payload along with the
   header information.  The following table shows the packet type for
   permutations of this information (see also the picture/GOB/Slice header
   descriptions in H.263+ for details):

--------------+--------------+----------------------+-------------------
 First 6 bits | P-Bit | PLEN |  Packet              |  Remarks
 of Payload   |(payload hdr.)|                      |
--------------+--------------+----------------------+-------------------
 100000       |   1   |  0   |  Picture             |  Typical Picture
 100000       |   1   | > 0  |  Picture             |  Note UFEP
 1xxxxx       |   1   |  0   |  GOB/Slice/EOS/EOSBS |  See possible GNs
 1xxxxx       |   1   | > 0  |  GOB/Slice           |  See possible GNs
 Xxxxxx       |   0   |  0   |  Follow-on           |
 Xxxxxx       |   0   | > 0  |  Follow-on           |  Interior Resync
--------------+--------------+----------------------+-------------------

   The details regarding the possible values of the five bit Group
   Number (GN) field which follows the initial "1" bit when the P-bit is
   "1" for a GOB, Slice, EOS, or EOSBS packet are found in section 5.2.3
   of [4].

   As defined in this specification, every start of a coded frame (as
   indicated by the presence of a PSC) has to be encapsulated as a
   picture segment packet.  If the whole coded picture fits into one


Bormann, et. al.            Standards Track                    [Page 13]
^L
RFC 2429                         H.263+                     October 1998


   packet of reasonable size (which is dependent on the connection
   characteristics), this is the only type of packet that may need to be
   used.  Due to the high compression ratio achieved by H.263+ it is
   often possible to use this mechanism, especially for small spatial
   picture formats such as QCIF and typical Internet packet sizes around
   1500 bytes.

   If the complete coded frame does not fit into a single packet, two
   different ways for the packetization may be chosen.  In case of very
   low or zero packet loss probability, one or more Follow-on packets
   may be used for coding the rest of the picture.  Doing so leads to
   minimal coding and packetization overhead as well as to an optimal
   use of the maximal packet size, but does not provide any added error
   resilience.

   The alternative is to break the picture into reasonably small
   partitions - called Segments - (by using the Slice or GOB mechanism),
   that do offer synchronization points.  By doing so and using the
   Picture Segment payload with PLEN>0, decoding of the transmitted
   packets is possible even in such cases in which the Picture packet
   containing the picture header was lost (provided any necessary
   reference picture is available). Picture Segment packets can also be
   used in conjunction with Follow-on packets for large segment sizes.

7. Security Considerations

   RTP packets using the payload format defined in this specification
   are subject to the security considerations discussed in the RTP
   specification [1], and any appropriate RTP profile (for example [2]).
   This implies that confidentiality of the media streams is achieved by
   encryption.  Because the data compression used with this payload
   format is applied end-to-end, encryption may be performed after
   compression so there is no conflict between the two operations.

   A potential denial-of-service threat exists for data encodings using
   compression techniques that have non-uniform receiver-end
   computational load.  The attacker can inject pathological datagrams
   into the stream which are complex to decode and cause the receiver to
   be overloaded.  However, this encoding does not exhibit any
   significant non-uniformity.

   As with any IP-based protocol, in some circumstances a receiver may
   be overloaded simply by the receipt of too many packets, either
   desired or undesired.  Network-layer authentication may be used to
   discard packets from undesired sources, but the processing cost of
   the authentication itself may be too high.  In a multicast


Bormann, et. al.            Standards Track                    [Page 14]
^L
RFC 2429                         H.263+                     October 1998


   environment, pruning of specific sources may be implemented in future
   versions of IGMP [5] and in multicast routing protocols to allow a
   receiver to select which sources are allowed to reach it.

   A security review of this payload format found no additional
   considerations beyond those in the RTP specification.

8. Addresses of Authors

   Carsten Bormann
   Universitaet Bremen FB3 TZI      EMail: cabo@tzi.org
   Postfach 330440                  Phone: +49.421.218-7024
   D-28334 Bremen, GERMANY          Fax:   +49.421.218-7000


   Linda Cline
   Intel Corp. M/S JF3-206          EMail: lscline@jf.intel.com
   2111 NE 25th Avenue              Phone: +1 503 264 3501
   Hillsboro, OR 97124, USA         Fax:   +1 503 264 3483


   Gim Deisher
   Intel Corp. M/S JF2-78           EMail: gim.l.deisher@intel.com
   2111 NE 25th Avenue              Phone: +1 503 264 3758
   Hillsboro, OR 97124, USA         Fax:   +1 503 264 9372


   Tom Gardos
   Intel Corp. M/S JF2-78           EMail: thomas.r.gardos@intel.com
   2111 NE 25th Avenue              Phone: +1 503 264 6459
   Hillsboro, OR 97124, USA         Fax:   +1 503 264 9372


   Christian Maciocco
   Intel Corp. M/S JF3-206          EMail: christian.maciocco@intel.com
   2111 NE 25th Avenue              Phone: +1 503 264 1770
   Hillsboro, OR 97124, USA         Fax:   +1 503 264 9428


   Donald Newell
   Intel Corp. M/S JF3-206          EMail: donald.newell@intel.com
   2111 NE 25th Avenue              Phone: +1 503 264 9234
   Hillsboro, OR 97124, USA         Fax:   +1 503 264 9428


Bormann, et. al.            Standards Track                    [Page 15]
^L
RFC 2429                         H.263+                     October 1998


   Joerg Ott
   Universitaet Bremen FB3 TZI      EMail: jo@tzi.org
   Postfach 330440                  Phone: +49.421.218-7024
   D-28334 Bremen, GERMANY          Fax:   +49.421.218-7000


   Gary Sullivan
   PictureTel Corp. M/S 635         EMail: garys@pictel.com
   100 Minuteman Road               Phone: +1 978 623 4324
   Andover, MA 01810, USA           Fax:   +1 978 749 2804


   Stephan Wenger
   Technische Universitaet Berlin FB13
   Sekr. FR 6-3                     EMail: stewe@cs.tu-berlin.de
   Franklinstr. 28/29               Phone: +49.30.314-73160
   D-10587 Berlin, GERMANY          Fax:   +49.30.314-25156


   Chad Zhu
   Intel Corp. M/S JF3-202          EMail: czhu@ix.netcom.com
   2111 NE 25th Avenue              Phone: +1 503 264 6004
   Hillsboro, OR 97124, USA         Fax:   +1 503 264 1805

9. References

   [1] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson,
       "RTP : A Transport Protocol for Real-Time Applications", RFC
       1889, January 1996.

   [2] Schulzrinne, H., "RTP Profile for Audio and Video Conference with
       Minimal Control", RFC 1890, January 1996.

   [3] "Video Coding for Low Bit Rate Communication," ITU-T
       Recommendation H.263, March 1996.

   [4] "Video Coding for Low Bit Rate Communication," ITU-T
       Recommendation H.263, January 1998.

   [5] Turletti, T. and C. Huitema, "RTP Payload Format for H.261 Video
       Streams", RFC 2032, October 1996.

   [6] Zhu, C., "RTP Payload Format for H.263 Video Streams", RFC 2190,
       September 1997.

   [7] S. Wenger, "Video Redundancy Coding in H.263+," Proc. Audio-
       Visual Services over Packet Networks, Aberdeen, U.K., September
       1997.


Bormann, et. al.            Standards Track                    [Page 16]
^L
RFC 2429                         H.263+                     October 1998


10.  Full Copyright Statement

   Copyright (C) The Internet Society (1998).  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Bormann, et. al.            Standards Track                    [Page 17]
^L