summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc9347.txt
blob: 51e98ebd005e6beabf01f5b3543bceb1ef2882f5 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
Internet Engineering Task Force (IETF)                          C. Hopps
Request for Comments: 9347                       LabN Consulting, L.L.C.
Category: Standards Track                                   January 2023
ISSN: 2070-1721


 Aggregation and Fragmentation Mode for Encapsulating Security Payload
        (ESP) and Its Use for IP Traffic Flow Security (IP-TFS)

Abstract

   This document describes a mechanism for aggregation and fragmentation
   of IP packets when they are being encapsulated in Encapsulating
   Security Payload (ESP).  This new payload type can be used for
   various purposes, such as decreasing encapsulation overhead for small
   IP packets; however, the focus in this document is to enhance IP
   Traffic Flow Security (IP-TFS) by adding Traffic Flow Confidentiality
   (TFC) to encrypted IP-encapsulated traffic.  TFC is provided by
   obscuring the size and frequency of IP traffic using a fixed-size,
   constant-send-rate IPsec tunnel.  The solution allows for congestion
   control, as well as nonconstant send-rate usage.

Status of This Memo

   This is an Internet Standards Track document.

   This document is a product of the Internet Engineering Task Force
   (IETF).  It represents the consensus of the IETF community.  It has
   received public review and has been approved for publication by the
   Internet Engineering Steering Group (IESG).  Further information on
   Internet Standards is available in Section 2 of RFC 7841.

   Information about the current status of this document, any errata,
   and how to provide feedback on it may be obtained at
   https://www.rfc-editor.org/info/rfc9347.

Copyright Notice

   Copyright (c) 2023 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Revised BSD License text as described in Section 4.e of the
   Trust Legal Provisions and are provided without warranty as described
   in the Revised BSD License.

Table of Contents

   1.  Introduction
     1.1.  Terminology & Concepts
   2.  The AGGFRAG Tunnel
     2.1.  Tunnel Content
     2.2.  Payload Content
       2.2.1.  DataBlocks
       2.2.2.  End Padding
       2.2.3.  Fragmentation, Sequence Numbers, and All-Pad Payloads
       2.2.4.  Empty Payload
       2.2.5.  IP Header Value Mapping
       2.2.6.  IPv4 Time To Live (TTL), IPv6 Hop Limit, and ICMP
               Messages
       2.2.7.  Effective MTU of the Tunnel
     2.3.  Exclusive SA Use
     2.4.  Modes of Operation
       2.4.1.  Non-Congestion-Controlled Mode
       2.4.2.  Congestion-Controlled Mode
     2.5.  Summary of Receiver Processing
   3.  Congestion Information
     3.1.  ECN Support
   4.  Configuration of AGGFRAG Tunnels for IP-TFS
     4.1.  Bandwidth
     4.2.  Fixed Packet Size
     4.3.  Congestion Control
   5.  IKEv2
     5.1.  USE_AGGFRAG Notification Message
   6.  Packet and Data Formats
     6.1.  AGGFRAG_PAYLOAD Payload
       6.1.1.  Non-Congestion-Control AGGFRAG_PAYLOAD Payload Format
       6.1.2.  Congestion Control AGGFRAG_PAYLOAD Payload Format
       6.1.3.  Data Blocks
       6.1.4.  IKEv2 USE_AGGFRAG Notification Message
   7.  IANA Considerations
     7.1.  ESP Next Header Value
     7.2.  AGGFRAG_PAYLOAD Sub-Types
     7.3.  USE_AGGFRAG Notify Message Status Type
   8.  Security Considerations
   9.  References
     9.1.  Normative References
     9.2.  Informative References
   Appendix A.  Example of an Encapsulated IP Packet Flow
   Appendix B.  A Send and Loss Event Rate Calculation
   Appendix C.  Comparisons of IP-TFS
     C.1.  Comparing Overhead
       C.1.1.  IP-TFS Overhead
       C.1.2.  ESP with Padding Overhead
     C.2.  Overhead Comparison
     C.3.  Comparing Available Bandwidth
       C.3.1.  Ethernet
   Acknowledgements
   Contributors
   Author's Address

1.  Introduction

   Traffic analysis [RFC4301] [AppCrypt] is the act of extracting
   information about data being sent through a network.  While directly
   obscuring the data with encryption [RFC4303], the patterns in the
   message traffic may expose information due to variations in its shape
   and timing [RFC8546] [AppCrypt].  Hiding the size and frequency of
   traffic is referred to as Traffic Flow Confidentiality (TFC), per
   [RFC4303].

   [RFC4303] provides for TFC by allowing padding to be added to
   encrypted IP packets and allowing for transmission of all-pad packets
   (indicated using protocol 59).  This method has the major limitation
   that it can significantly underutilize the available bandwidth.

   This document defines an aggregation and fragmentation (AGGFRAG) mode
   for ESP, as well as ESP's use for IP Traffic Flow Security (IP-TFS).
   This solution provides for full TFC without the aforementioned
   bandwidth limitation.  This is accomplished by using a constant-send-
   rate IPsec [RFC4303] tunnel with fixed-size encapsulating packets;
   however, these fixed-size packets can contain partial, whole, or
   multiple IP packets to maximize the bandwidth of the tunnel.  A
   nonconstant send rate is allowed, but the confidentiality properties
   of its use are outside the scope of this document.

   For a comparison of the overhead of IP-TFS with the TFC solution
   prescribed in [RFC4303], see Appendix C.

   Additionally, IP-TFS provides for operating fairly within congested
   networks [RFC2914].  This is important for when the IP-TFS user is
   not in full control of the domain through which the IP-TFS tunnel
   path flows.

   The mechanisms, such as the AGGFRAG mode, defined in this document
   are generic with the intent of allowing for non-TFS uses, but such
   uses are outside the scope of this document.

1.1.  Terminology & Concepts

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

   This document assumes familiarity with IP security concepts,
   including TFC, as described in [RFC4301].

2.  The AGGFRAG Tunnel

   As mentioned in Section 1, the AGGFRAG mode utilizes an IPsec
   [RFC4303] tunnel as its transport.  For the purpose of IP-TFS, fixed-
   size encapsulating packets are sent at a constant rate on the AGGFRAG
   tunnel.

   The primary input to the tunnel algorithm is the requested bandwidth
   to be used by the tunnel.  Two values are then required to provide
   for this bandwidth use: the fixed size of the encapsulating packets
   and the rate at which to send them.

   The fixed packet size MAY either be specified manually or be
   determined through other methods, such as the Packetization Layer MTU
   Discovery (PLMTUD) [RFC4821] [RFC8899] or Path MTU Discovery (PMTUD)
   [RFC1191] [RFC8201].  PMTUD is known to have issues, so PLMTUD is
   considered the more robust option.  For PLMTUD, congestion control
   payloads can be used as in-band probes (see Section 6.1.2 and
   [RFC8899]).

   Given the encapsulating packet size and the requested bandwidth to be
   used, the corresponding packet send rate can be calculated.  The
   packet send rate is the requested bandwidth to be used, which is then
   divided by the size of the encapsulating packet.

   The egress (receiving) side of the AGGFRAG tunnel MUST allow for and
   expect the ingress (sending) side of the AGGFRAG tunnel to vary the
   size and rate of sent encapsulating packets, unless constrained by
   other policy.

2.1.  Tunnel Content

   As previously mentioned, one issue with the TFC padding solution in
   [RFC4303] is the large amount of wasted bandwidth, as only one IP
   packet can be sent per encapsulating packet.  In order to maximize
   bandwidth, IP-TFS breaks this one-to-one association by introducing
   an AGGFRAG mode for ESP.

   The AGGFRAG mode aggregates and fragments the inner IP traffic flow
   into encapsulating IPsec tunnel packets.  For IP-TFS, the IPsec
   encapsulating tunnel packets are a fixed size.  Padding is only added
   to the tunnel packets if there is no data available to be sent at the
   time of tunnel packet transmission or if fragmentation has been
   disabled by the receiver.

   This is accomplished using a new Encapsulating Security Payload (ESP)
   [RFC4303] Next Header field value AGGFRAG_PAYLOAD (Section 6.1).

   Other non-IP-TFS uses of this AGGFRAG mode have been suggested, such
   as increased performance through packet aggregation, as well as
   handling MTU issues using fragmentation.  These uses are not defined
   here but are also not restricted by this document.

2.2.  Payload Content

   The AGGFRAG_PAYLOAD payload content defined in this document consists
   of a 4- or 24-octet header, followed by either a partial data block,
   a full data block, or multiple partial or full data blocks.  The
   following diagram illustrates this payload within the ESP packet.
   See Section 6.1 for the exact formats of the AGGFRAG_PAYLOAD payload.

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
    . Outer Encapsulating Header ...                                .
    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
    . ESP Header...                                                 .
    +---------------------------------------------------------------+
    |   [AGGFRAG sub-type/flags]   :           BlockOffset          |
    +---------------------------------------------------------------+
    :                  [Optional Congestion Info]                   :
    +---------------------------------------------------------------+
    |       DataBlocks ...                                          ~
    ~                                                               ~
    ~                                                               |
    +---------------------------------------------------------------|
    . ESP Trailer...                                                .
    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

              Figure 1: Layout of an AGGFRAG Mode IPsec Packet

   The BlockOffset value is either zero or some offset into or past the
   end of the DataBlocks data.

   If the BlockOffset value is zero, it means that the DataBlocks data
   begins with a new data block.

   Conversely, if the BlockOffset value is non-zero, it points to the
   start of the new data block, and the initial DataBlocks data belongs
   to the data block that is still being reassembled.

   If the BlockOffset points past the end of the DataBlocks data, then
   the next data block occurs in a subsequent encapsulating packet.

   Having the BlockOffset always point at the next available data block
   allows for recovering the next inner packet in the presence of outer
   encapsulating packet loss.

   An example AGGFRAG mode packet flow can be found in Appendix A.

2.2.1.  DataBlocks

    +---------------------------------------------------------------+
    | Type  | rest of IPv4, IPv6, or pad...
    +--------

                      Figure 2: Layout of a Data Block

   A data block is defined by a 4-bit type code, followed by the data
   block data.  The type values have been carefully chosen to coincide
   with the IPv4/IPv6 version field values so that no per-data block
   type overhead is required to encapsulate an IP packet.  Likewise, the
   length of the data block is extracted from the encapsulated IPv4's
   Total Length or IPv6's Payload Length fields.

2.2.2.  End Padding

   Since a data block's type is identified in its first 4 bits, the only
   time padding is required is when there is no data to encapsulate.
   For this end padding, a Pad Data Block is used.

2.2.3.  Fragmentation, Sequence Numbers, and All-Pad Payloads

   In order for a receiver to reassemble fragmented inner packets, the
   sender MUST send the inner packet fragments back to back in the
   logical outer packet stream (i.e., using consecutive ESP sequence
   numbers).  However, the sender is allowed to insert "all-pad"
   payloads (i.e., payloads with a BlockOffset of zero and a single pad
   data block ) in between the packets carrying the inner packet
   fragment payloads.  This interleaving of all-pad payloads allows the
   sender to always send a tunnel packet, regardless of the
   encapsulation computational requirements.

   When a receiver is reassembling an inner packet, and it receives an
   "all-pad" payload, it increments the expected sequence number that
   the next inner packet fragment is expected to arrive in.

   Given the above, the receiver will need to handle out-of-order
   arrival of outer ESP packets prior to reassembly processing.  ESP
   already provides for optionally detecting replay attacks.  Detecting
   replay attacks normally utilizes a window method.  A similar
   sequence-number-based sliding window can be used to correct
   reordering of the outer packet stream.  Receiving a larger (newer)
   sequence number packet advances the window, and if any older ESP
   packets whose sequence numbers the window has passed by are received,
   then the packets are dropped.  A good choice for the size of this
   window depends on the amount of misordering the user is experiencing;
   however, a value of 3 has been suggested as a default when no more
   informed choice exists.

   As the amount of misordering that may be present is hard to predict,
   the window size SHOULD be configurable by the user.  Implementations
   MAY also dynamically adjust the reordering window based on actual
   misordering seen in arriving packets.

   Please note, when IP-TFS sends a continuous stream of packets, there
   is no requirement for an explicit lost packet timer; however, using a
   lost packet timer is RECOMMENDED.  If an implementation does not use
   a lost packet timer and only considers an outer packet lost when the
   reorder window moves by it, the inner traffic can be delayed by up to
   the reorder window size times the per-packet send rate.  This delay
   could be significant for slower send rates or when larger reorder
   window sizes are in use.  As the lost packet timer affects the delay
   of inner packet delivery, an implementation or user could choose to
   set it proportionate to the tunnel rate.

   While ESP guarantees an increasing sequence number with subsequently
   sent packets, it does not actually require the sequence numbers to be
   generated consecutively (e.g., sending only even-numbered sequence
   numbers would be allowed, as long as they are always increasing).
   Gaps in the sequence numbers will not work for this document, so the
   sequence number stream MUST increase monotonically by 1 for each
   subsequent packet.

   When using the AGGFRAG_PAYLOAD in conjunction with replay detection,
   the window size for both MAY be reduced to the smaller of the two
   window sizes.  This is because packets outside of the smaller window
   but inside the larger window would still be dropped by the mechanism
   with the smaller window size.  However, there is also no requirement
   to make these values the same.  Indeed, in some cases, such as slow
   tunnels where a very small or zero reorder window size is
   appropriate, the user may still want a large replay detection window
   to log replayed packets.  Additionally, large replay windows can be
   implemented with very little overhead, compared to large reorder
   windows.

   Finally, as sequence numbers are reset when switching Security
   Associations (SAs) (e.g., when rekeying a Child SA), senders MUST NOT
   send initial fragments of an inner packet using one SA and subsequent
   fragments in a different SA.

      |  A note on BlockOffset values: Senders MUST encode the
      |  BlockOffset consistently with the immediately preceding non-
      |  all-pad payload packet.  Specifically, if the immediately
      |  preceding non-all-pad payload packet ended with a Pad Data
      |  Block, this BlockOffset MUST be zero, as Pad Data Blocks are
      |  never fragmented.  The BlockOffset MUST be consistent with the
      |  remaining size implied by the length field from the fragmented
      |  inner packet.

2.2.3.1.  Optional Extra Padding

   When the tunnel bandwidth is not being fully utilized, a sender MAY
   pad out the current encapsulating packet in order to deliver an inner
   packet unfragmented in the following outer packet.  The benefit would
   be to avoid inner packet fragmentation in the presence of a bursty
   offered load (non-bursty traffic will naturally not fragment).
   Senders MAY also choose to allow for a minimum fragment size to be
   configured (e.g., as a percentage of the AGGFRAG_PAYLOAD payload
   size) to avoid fragmentation at the cost of tunnel bandwidth.  The
   costs with these methods are complexity and an added delay of inner
   traffic.  The main advantage to avoiding fragmentation is to minimize
   inner packet loss in the presence of outer packet loss.  When this is
   worthwhile (e.g., how much loss and what type of loss is required,
   given different inner traffic shapes and utilization, for this to
   make sense) and what values to use for the allowable/added delay may
   be worth researching but is outside the scope of this document.

   While use of padding to avoid fragmentation does not impact
   interoperability, if padding is used inappropriately, it can reduce
   the effective throughput of a tunnel.  Senders implementing either of
   the above approaches will need to take care to not reduce the
   effective capacity, and overall utility, of the tunnel through the
   overuse of padding.

2.2.4.  Empty Payload

   To support reporting of congestion control information (described
   later) using a non-AGGFRAG_PAYLOAD-enabled SA, it is allowed to send
   an AGGFRAG_PAYLOAD payload with no data blocks (i.e., the ESP payload
   length is equal to the AGGFRAG_PAYLOAD header length).  This special
   payload is called an empty payload.

   Currently, this situation is only applicable in use cases without
   Internet Key Exchange Protocol Version 2 (IKEv2).

2.2.5.  IP Header Value Mapping

   [RFC4301] provides some direction on when and how to map various
   values from an inner IP header to the outer encapsulating header,
   namely the Don't Fragment (DF) bit [RFC0791], the Differentiated
   Services (DS) field [RFC2474], and the Explicit Congestion
   Notification (ECN) field [RFC3168].  Unlike in [RFC4301], the AGGFRAG
   mode may, and often will, be encapsulating more than one IP packet
   per ESP packet.  To deal with this, these mappings are restricted
   further.

2.2.5.1.  DF Bit

   The AGGFRAG mode never maps the inner DF bit, as it is unrelated to
   the AGGFRAG tunnel functionality; the AGGFRAG mode never needs to IP
   fragment the inner packets, and the inner packets will not affect the
   fragmentation of the outer encapsulation packets.

2.2.5.2.  ECN Value

   The ECN value need not be mapped, as any congestion related to the
   constant-send-rate IP-TFS tunnel is unrelated (by design) to the
   inner traffic flow.  The sender MAY still set the ECN value of inner
   packets based on the normal ECN specification [RFC3168] [RFC4301]
   [RFC6040].

2.2.5.3.  DS Field

   By default, the DS field SHOULD NOT be copied, although a sender MAY
   choose to allow for configuration to override this behavior.  A
   sender SHOULD also allow the DS value to be set by configuration.

2.2.6.  IPv4 Time To Live (TTL), IPv6 Hop Limit, and ICMP Messages

   How to modify the inner packet IPv4 TTL [RFC0791] or IPv6 Hop Limit
   [RFC8200] is specified in [RFC4301].

   [RFC4301] specifies how to apply policy to authenticated and
   unauthenticated ICMP error packets (e.g., Destination Unreachable)
   arriving at or being forwarded through the endpoint, in particular,
   whether to process, ignore, or forward said packets.  With the one
   exception that this document does not change the handling of these
   packets, they should be handled as specified in [RFC4301].

   The one way in which an AGGFRAG tunnel differs in ICMP error packet
   mechanics is with PMTU.  When fragmentation is enabled on the AGGFRAG
   tunnel, then no ICMP "Too Big" errors need to be generated for
   arriving ingress traffic, as the arriving inner packets will be
   naturally fragmented by the AGGFRAG encapsulation.

   Otherwise, when fragmentation has been disabled on the AGGFRAG
   tunnel, then the treatment of arriving inner traffic exactly maps to
   that of a non-AGGFRAG ESP tunnel.  Explicitly, IPv4 with DF set and
   IPv6 packets that cannot fit in its own outer packet payload will
   generate the appropriate ICMP "Too Big" error, as described in
   [RFC4301], and IPv4 packets without DF set will be IP fragmented, as
   described in [RFC4301].

   Packets egressing the tunnel continue to be handled as specified in
   [RFC4301].

   All other aspects of PMTU and the handling of ICMP "Too Big" messages
   (i.e., with regards to the outer AGGFRAG/ESP tunnel packet size) also
   remain unchanged from [RFC4301].

2.2.7.  Effective MTU of the Tunnel

   Unlike in [RFC4301], there is normally no effective MTU (EMTU) on an
   AGGFRAG tunnel, as all IP packet sizes are properly transmitted
   without requiring IP fragmentation prior to tunnel ingress.  That
   said, a sender MAY allow for explicitly configuring an MTU for the
   tunnel.

   If fragmentation has been disabled on the AGGFRAG tunnel, then the
   tunnel's EMTU and behaviors are the same as normal IPsec tunnels
   [RFC4301].

2.3.  Exclusive SA Use

   This document does not specify mixed use of an AGGFRAG_PAYLOAD-
   enabled SA.  A sender MUST only send AGGFRAG_PAYLOAD payloads over an
   SA configured for AGGFRAG mode.

2.4.  Modes of Operation

   Just as with normal IPsec/ESP SAs, AGGFRAG SAs are unidirectional.
   Bidirectional IP-TFS functionality is achieved by setting up 2
   AGGFRAG SAs, one in either direction.

   An AGGFRAG tunnel used for IP-TFS can operate in 2 modes, a non-
   congestion-controlled mode and congestion-controlled mode.

2.4.1.  Non-Congestion-Controlled Mode

   In the non-congestion-controlled mode, IP-TFS sends fixed-size
   packets over an AGGFRAG tunnel at a constant rate.  The packet send
   rate is constant and is not automatically adjusted, regardless of any
   network congestion (e.g., packet loss).

   For similar reasons as given in [RFC7510], the non-congestion-
   controlled mode MUST only be used where the user has full
   administrative control over any path the tunnel will take and MUST
   NOT be used if this is not the case.  This is required so the user
   can guarantee the bandwidth and also be sure as to not be negatively
   affecting network congestion [RFC2914].  In this case, packet loss
   should be reported to the administrator (e.g., via syslog, YANG
   notification, SNMP traps, etc.) so that any failures due to a lack of
   bandwidth can be corrected.  The use of circuit breakers is also
   RECOMMENDED (Section 2.4.2.1).

   Users that choose the non-congestion-controlled mode need to
   understand that this mode will send packets at a constant rate,
   utilizing a constant, fixed bandwidth, and will not adjust based on
   congestion.  Thus, if they do not guarantee the bandwidth required by
   the tunnel, the tunnel's operation, as well as the rest of their
   network, may be negatively impacted.

   One expected use case for the non-congestion-controlled mode is to
   guarantee the full tunnel bandwidth is available and preferred over
   other non-tunnel traffic.  In fact, a typical site-to-site use case
   might have all of the user traffic utilizing the IP-TFS tunnel.

   The non-congestion-controlled mode is also appropriate if ESP over
   TCP is in use [RFC9329].  However, the use of TCP is considered a
   fallback-only solution for IPsec; it is highly not preferred.  This
   is also one of the reasons that TCP was not chosen as the
   encapsulation for IP-TFS instead of AGGFRAG.

2.4.2.  Congestion-Controlled Mode

   With the congestion-controlled mode, IP-TFS adapts to network
   congestion by lowering the packet send rate to accommodate the
   congestion, as well as raising the rate when congestion subsides.
   Since overhead is per packet, by allowing for maximal fixed-size
   packets and varying the send rate, transport overhead is minimized.

   The output of the congestion control algorithm will adjust the rate
   at which the ingress sends packets.  While this document does not
   require a specific congestion control algorithm, best current
   practice RECOMMENDS that the algorithm conform to [RFC5348].
   Congestion control principles are documented in [RFC2914] as well.
   There is an example in [RFC4342] of the algorithm in [RFC5348], which
   matches the requirements of IP-TFS (i.e., designed for fixed-size
   packets and send rate varied based on congestion).

   The required inputs for the TCP-friendly rate control algorithm
   described in [RFC5348] are the receiver's loss event rate and the
   sender's estimated round-trip time (RTT).  These values are provided
   by IP-TFS using the congestion information header fields described in
   Section 3.  In particular, these values are sufficient to implement
   the algorithm described in [RFC5348].

   At a minimum, the congestion information MUST be sent, from the
   receiver and from the sender, at least once per RTT.  Prior to
   establishing an RTT, the information SHOULD be sent constantly from
   the sender and the receiver so that an RTT estimate can be
   established.  Not receiving this information over multiple
   consecutive RTT intervals should be considered a congestion event
   that causes the sender to adjust its sending rate lower.  For
   example, this is called the "no feedback timeout" in [RFC4342], and
   it is equal to 4 RTT intervals.  When a "no feedback timeout" has
   occurred, the sending rate is halved, as per [RFC4342].

   An implementation MAY choose to always include the congestion
   information in its AGGFRAG payload header if it is sending it on an
   IP-TFS-enabled SA.  Since IP-TFS normally will operate with a large
   packet size, the congestion information should represent a small
   portion of the available tunnel bandwidth.  An implementation
   choosing to always send the data MAY also choose to only update the
   LossEventRate and RTT header field values it sends every RTT through.

   When choosing a congestion control algorithm (or a selection of
   algorithms), note that IP-TFS is not providing for reliable delivery
   of IP traffic, and so per-packet acknowledgements (ACKs) are not
   required and are not provided.

   It is worth noting that the variable send rate of a congestion-
   controlled AGGFRAG tunnel is not private; however, this send rate is
   being driven by network congestion, and as long as the encapsulated
   (inner) traffic flow shape and timing are not directly affecting the
   (outer) network congestion, the variations in the tunnel rate will
   not weaken the provided inner traffic flow confidentiality.

2.4.2.1.  Circuit Breakers

   In addition to congestion control, implementations that support the
   non-congestion-control mode SHOULD implement circuit breakers
   [RFC8084] as a recovery method of last resort.  When circuit breakers
   are enabled, an implementation SHOULD also enable congestion control
   reports so that circuit breakers have information to act on.

   The pseudowire congestion considerations [RFC7893] are equally
   applicable to the mechanisms defined in this document, notably the
   text on inelastic traffic.

   One example of a simple, slow-trip circuit breaker that an
   implementation may provide would utilize 2 values: the amount of
   persistent loss rate required to trip the circuit breaker and the
   required length of time this persistent loss rate must be seen to
   trip the circuit breaker.  These 2 value are required configurations
   from the user.  When the circuit breaker is tripped, the tunnel
   traffic is disabled and an appropriate log message or other
   management type alarm is triggered, indicating operation intervention
   is required.

2.5.  Summary of Receiver Processing

   An AGGFRAG-enabled SA receiver has a few tasks to perform.

   The receiver MAY process incoming AGGFRAG_PAYLOAD payloads as soon as
   they arrive, as much as it can, i.e., if the incoming AGGFRAG_PAYLOAD
   packet contains complete inner packet(s), the receiver should extract
   and transmit them immediately.  For partial packets, the receiver
   needs to keep the partial packets in the memory until they fall out
   from the reordering window or until the missing parts of the packets
   are received, in which case, it will reassemble and transmit them.
   If the AGGFRAG_PAYLOAD payload contains multiple packets, they SHOULD
   be sent out in the order they are in the AGGFRAG_PAYLOAD (i.e., keep
   the original order they were received on the other end).  The cost of
   using this method is that an amplification of out-of-order delivery
   of inner packets can occur due to inner packet aggregation.

   Instead of the method described in the previous paragraph, the
   receiver MAY reorder out-of-order AGGFRAG_PAYLOAD payloads received
   into in-sequence-order AGGFRAG_PAYLOAD payloads (Section 2.2.3), and
   only after it has an in-order AGGFRAG_PAYLOAD payload stream would
   the receiver transmit the inner packets.  Using this method will
   ensure the inner packets are sent in order.  The cost of this method
   is that a lost packet will cause a delay of up to the lost packet
   timer interval (or the full reorder window if no lost packet timer is
   used).  Additionally, there can be extra burstiness in the output
   stream.  This burstiness can happen when a lost packet is dropped
   from the reorder window, and the remaining outer packets in the
   reorder window are immediately processed and sent out back to back.

   Additionally, if congestion control is enabled, the receiver sends
   congestion control data (Section 6.1.2) back to the sender, as
   described in Sections 2.4.2 and 3.

   Finally, a note on receiving incorrect BlockOffset values: To account
   for misbehaving senders, a receiver SHOULD gracefully handle the case
   where the BlockOffset of consecutive packets, and/or the inner packet
   they share, do not agree.  It MAY drop the inner packet or one or
   both of the outer packets.

3.  Congestion Information

   In order to support the congestion-controlled mode, the sender needs
   to know the loss event rate and to approximate the RTT [RFC5348].  In
   order to obtain these values, the receiver sends congestion control
   information on its SA back to the sender.  Thus, to support
   congestion control, the receiver MUST have a paired SA back to the
   sender (this is always the case when the tunnel was created using
   IKEv2).  If the SA back to the sender is a non-AGGFRAG_PAYLOAD-
   enabled SA, then an AGGFRAG_PAYLOAD empty payload (i.e., header only)
   is used to convey the information.

   In order to calculate a loss event rate compatible with [RFC5348],
   the receiver needs to have an RTT estimate.  Thus, the sender
   communicates this estimate in the RTT header field.  On startup, this
   value will be zero, as no RTT estimate is yet known.

   In order for the sender to estimate its RTT value, the sender places
   a timestamp value in the TVal header field.  On first receipt of this
   TVal, the receiver records the new TVal value, along with the time it
   arrived locally.  Subsequent receipt of the same TVal MUST NOT update
   the recorded time.

   When the receiver sends its congestion control header, it places this
   latest recorded TVal in the TEcho header field, along with 2 delay
   values: Echo Delay and Transmit Delay.  The Echo Delay value is the
   time delta from the recorded arrival time of TVal and the current
   clock in microseconds.  The second value, Transmit Delay, is the
   receiver's current transmission delay on the tunnel (i.e., the
   average time between sending packets on its half of the AGGFRAG
   tunnel).

   When the sender receives back its TVal in the TEcho header field, it
   calculates 2 RTT estimates.  The first is the actual delay found by
   subtracting the TEcho value from its current clock and then
   subtracting the Echo Delay as well.  The second RTT estimate is found
   by adding the received Transmit Delay header value to the sender's
   own transmission delay (i.e., the average time between sending
   packets on its half of the AGGFRAG tunnel).  The larger of these 2
   RTT estimates SHOULD be used as the RTT value.

   The two RTT estimates are required to handle different combinations
   of faster or slower tunnel packet paths with faster or slower fixed
   tunnel rates.  Choosing the larger of the two values guarantees that
   the RTT is never considered faster than the aggregate transmission
   delay based on the IP-TFS send rate (the second estimate), as well as
   never being considered faster than the actual RTT along the tunnel
   packet path (the first estimate).

   The receiver also calculates, and communicates in the LossEventRate
   header field, the loss event rate for use by the sender.  This is
   slightly different from [RFC4342], which periodically sends all the
   loss interval data back to the sender so that it can do the
   calculation.  See Appendix B for a suggested way to calculate the
   loss event rate value.  Initially, this value will be zero
   (indicating no loss) until enough data has been collected by the
   receiver to update it.

3.1.  ECN Support

   In addition to normal packet loss information, the AGGFRAG mode
   supports use of the ECN bits in the encapsulating IP header [RFC3168]
   for identifying congestion.  If ECN use is enabled and a packet
   arrives at the egress (receiving) side with the Congestion
   Experienced (CE) value set, then the receiver considers that packet
   as being dropped, although it does not drop it.  The receiver MUST
   set the E bit in any AGGFRAG_PAYLOAD payload header containing a
   LossEventRate value derived from a CE value being considered.

   In [RFC6040], which updates [RFC3168] and [RFC4301], behaviors for
   marking the outer ECN field value based on the ECN field of the inner
   packet are defined.  As the AGGFRAG mode may have multiple inner
   packets present in a single outer packet, and there is no obvious
   correct way to map these multiple values to the single outer packet
   ECN field value, the tunnel ingress endpoint SHOULD operate in the
   "compatibility" mode, rather than the "default" mode from [RFC6040].
   In particular, this means that the ingress (sending) endpoint of the
   tunnel always sets the newly constructed outer encapsulating packet
   header ECN field to Not-ECT [RFC6040].

4.  Configuration of AGGFRAG Tunnels for IP-TFS

   IP-TFS is meant to be deployable with a minimal amount of
   configuration.  All IP-TFS-specific configuration should be specified
   at the unidirectional tunnel ingress (sending) side.  It is intended
   that non-IKEv2 operation is supported, at least, with local static
   configuration.

   YANG and MIB documents have been defined for IP-TFS in [RFC9348] and
   [RFC9349].

4.1.  Bandwidth

   Bandwidth is a local configuration option.  For the non-congestion-
   controlled mode, the bandwidth SHOULD be configured.  For the
   congestion-controlled mode, the bandwidth can be configured or the
   congestion control algorithm discovers and uses the maximum bandwidth
   available.  No standardized configuration method is required.

4.2.  Fixed Packet Size

   The fixed packet size to be used for the tunnel encapsulation packets
   MAY be configured manually or can be automatically determined using
   other methods, such as PLMTUD [RFC4821] [RFC8899] or PMTUD [RFC1191]
   [RFC8201].  As PMTUD is known to have issues, PLMTUD is considered
   the more robust option.  No standardized configuration method is
   required.

4.3.  Congestion Control

   Congestion control is a local configuration option.  No standardized
   configuration method is required.

5.  IKEv2

5.1.  USE_AGGFRAG Notification Message

   As mentioned previously, AGGFRAG tunnels utilize ESP payloads of type
   AGGFRAG_PAYLOAD.

   When using IKEv2, a new "USE_AGGFRAG" notification message enables
   the AGGFRAG_PAYLOAD payload on a Child SA pair.  The method used is
   similar to how USE_TRANSPORT_MODE is negotiated, as described in
   [RFC7296].

   To request use of the AGGFRAG_PAYLOAD payload on the Child SA pair,
   the initiator includes the USE_AGGFRAG notification in an SA payload
   requesting a new Child SA (either during the initial IKE_AUTH or
   during CREATE_CHILD_SA exchanges).  If the request is accepted, then
   the response MUST also include a notification of type USE_AGGFRAG.
   If the responder declines the request, the Child SA will be
   established without AGGFRAG_PAYLOAD payload use enabled.  If this is
   unacceptable to the initiator, the initiator MUST delete the Child
   SA.

   As the use of the AGGFRAG_PAYLOAD payload is currently only defined
   for non-transport-mode tunnels, the USE_AGGFRAG notification MUST NOT
   be combined with the USE_TRANSPORT notification.

   The USE_AGGFRAG notification contains a 1-octet payload of flags that
   specify requirements from the sender of the notification.  If any
   requirement flags are not understood or cannot be supported by the
   receiver, then the receiver SHOULD NOT enable use of AGGFRAG_PAYLOAD
   (either by not responding with the USE_AGGFRAG notification or, in
   the case of the initiator, by deleting the Child SA if the now-
   established non-AGGFRAG_PAYLOAD using SA is unacceptable).

   The notification type and payload flag values are defined in
   Section 6.1.4.

6.  Packet and Data Formats

   The packet and data formats defined below are generic with the intent
   of allowing for non-IP-TFS uses, but such uses are outside the scope
   of this document.

6.1.  AGGFRAG_PAYLOAD Payload

   ESP Next Header value: 144

   An AGGFRAG payload is identified by the ESP Next Header value
   AGGFRAG_PAYLOAD, which has the value 144, which has been reserved in
   the IP protocol numbers space.  The first octet of the payload
   indicates the format of the remaining payload data.

     0 1 2 3 4 5 6 7
    +-+-+-+-+-+-+-+-+-+-+-
    |   Sub-type    | ...
    +-+-+-+-+-+-+-+-+-+-+-

                  Figure 3: AGGFRAG_PAYLOAD Payload Format

   Sub-type:
      An 8-bit value indicating the payload format.

   This document defines 2 payload sub-types.  These payload formats are
   defined in the following sections.

6.1.1.  Non-Congestion-Control AGGFRAG_PAYLOAD Payload Format

   The non-congestion-control AGGFRAG_PAYLOAD payload consists of a
   4-octet header, followed by a variable amount of DataBlocks data, as
   shown below.

                         1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  Sub-Type (0) |   Reserved    |          BlockOffset          |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |       DataBlocks ...
    +-+-+-+-+-+-+-+-+-+-+-

              Figure 4: Non-Congestion-Control Payload Format

   Sub-type:
      An octet indicating the payload format.  For this non-congestion-
      control format, the value is 0.

   Reserved:
      An octet set to 0 on generation and ignored on receipt.

   BlockOffset:
      A 16-bit unsigned integer counting the number of octets of
      DataBlocks data before the start of a new data block.  If the
      start of a new data block occurs in a subsequent payload, the
      BlockOffset will point past the end of the DataBlocks data.  In
      this case, all the DataBlocks data belongs to the current data
      block being assembled.  When the BlockOffset extends into
      subsequent payloads, it continues to only count DataBlocks data
      (i.e., it does not count subsequent packets of the non-DataBlocks
      data, such as header octets).

   DataBlocks:
      Variable number of octets that begins with the start of a data
      block or the continuation of a previous data block, followed by
      zero or more additional data blocks.

6.1.2.  Congestion Control AGGFRAG_PAYLOAD Payload Format

   The congestion control AGGFRAG_PAYLOAD payload consists of a 24-octet
   header, followed by a variable amount of DataBlocks data, as shown
   below.

                         1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  Sub-type (1) |  Reserved |P|E|          BlockOffset          |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                          LossEventRate                        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      RTT                  |   Echo Delay ...
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         ... Echo Delay   |           Transmit Delay                |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                              TVal                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                             TEcho                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |       DataBlocks ...
    +-+-+-+-+-+-+-+-+-+-+-

                Figure 5: Congestion Control Payload Format

   Sub-type:
      An octet indicating the payload format.  For this congestion
      control format, the value is 1.

   Reserved:
      A 6-bit field set to 0 on generation and ignored on receipt.

   P:
      A 1-bit value that, if set, indicates that PLMTUD probing is in
      progress.  This information can be used to avoid treating missing
      packets as loss events by the congestion control algorithm when
      running the PLMTUD probe algorithm.

   E:
      A 1-bit value that, if set, indicates that Congestion Experienced
      (CE) ECN bits were received and used in deriving the reported
      LossEventRate.

   BlockOffset:
      The same value as the non-congestion-controlled payload format
      value.

   LossEventRate:
      A 32-bit value specifying the inverse of the current loss event
      rate, as calculated by the receiver.  A value of zero indicates no
      loss.  Otherwise, the loss event rate is 1/LossEventRate.

   RTT:
      A 22-bit value specifying the sender's current RTT estimate in
      microseconds.  The value MAY be zero prior to the sender having
      calculated an RTT estimate.  The value SHOULD be set to zero on
      non-AGGFRAG_PAYLOAD-enabled SAs.  If the RTT is equal to or larger
      than 0x3FFFFF, the value MUST be set to 0x3FFFFF.

   Echo Delay:
      A 21-bit value specifying the delay in microseconds incurred
      between the receiver first receiving the TVal value, which it is
      sending back in TEcho.  If the delay is equal to or larger than
      0x1FFFFF, the value MUST be set to 0x1FFFFF.

   Transmit Delay:
      A 21-bit value specifying the transmission delay in microseconds.
      This is the fixed (or average) delay on the receiver between it
      sending packets on the IP-TFS tunnel.  If the delay is equal to or
      larger than 0x1FFFFF, the value MUST be set to 0x1FFFFF.

   TVal:
      An opaque, 32-bit value that will be echoed back by the receiver
      in later packets in the TEcho field, along with an Echo Delay
      value of how long that echo took.

   TEcho:
      The opaque, 32-bit value from a received packet's TVal field.  The
      received TVal is placed in TEcho, along with an Echo Delay value
      indicating how long it has been since receiving the TVal value.

   DataBlocks:
      Variable number of octets that begins with the start of a data
      block or the continuation of a previous data block, followed by
      zero or more additional data blocks.  For the special case of
      sending congestion control information on a non-IP-TFS-enabled SA,
      this field MUST be empty (i.e., be zero octets long).

6.1.3.  Data Blocks

                         1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | Type  | IPv4, IPv6, or pad...
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

                        Figure 6: Data Block Format

   Type:
      A 4-bit field where 0x0 identifies a Pad Data Block, 0x4 indicates
      an IPv4 data block, and 0x6 indicates an IPv6 data block.

6.1.3.1.  IPv4 Data Block

                         1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  0x4  |  IHL  |  TypeOfService  |         TotalLength         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | Rest of the inner packet ...
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

                      Figure 7: IPv4 Data Block Format

   These values are the actual values within the encapsulated IPv4
   header.  In other words, the start of this data block is the start of
   the encapsulated IP packet.

   Type:
      A 4-bit value of 0x4 indicating IPv4 (i.e., first nibble of the
      IPv4 packet).

   TotalLength:
      The 16-bit unsigned integer "Total Length" field of the IPv4 inner
      packet.

6.1.3.2.  IPv6 Data Block

                         1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  0x6  | TrafficClass  |               FlowLabel               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |         PayloadLength         | Rest of the inner packet ...
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

                      Figure 8: IPv6 Data Block Format

   These values are the actual values within the encapsulated IPv6
   header.  In other words, the start of this data block is the start of
   the encapsulated IP packet.

   Type:
      A 4-bit value of 0x6 indicating IPv6 (i.e., first nibble of the
      IPv6 packet).

   PayloadLength:
      The 16-bit unsigned integer "Payload Length" field of the inner
      IPv6 inner packet.

6.1.3.3.  Pad Data Block

                         1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  0x0  | Padding ...
    +-+-+-+-+-+-+-+-+-+-+-

                      Figure 9: Pad Data Block Format

   Type:
      A 4-bit value of 0x0 indicating a padding data block.

   Padding:
      Extends to end of the encapsulating packet.

6.1.4.  IKEv2 USE_AGGFRAG Notification Message

   As discussed in Section 5.1, a notification message USE_AGGFRAG is
   used to negotiate use of the ESP AGGFRAG_PAYLOAD Next Header value.

   The USE_AGGFRAG Notification Message State Type is 16442.

   The notification payload contains 1 octet of requirement flags.
   There are currently 2 requirement flags defined.  This may be revised
   by later specifications.

    +-+-+-+-+-+-+-+-+
    |0|0|0|0|0|0|C|D|
    +-+-+-+-+-+-+-+-+

                  Figure 10: USE_AGGFRAG Requirement Flags

   0:
      6 bits - Reserved MUST be zero on send, unless defined by later
      specifications.

   C:
      Congestion Control bit.  If set, then the sender is requiring that
      congestion control information MUST be returned to it
      periodically, as defined in Section 3.

   D:
      Don't Fragment bit.  If set, it indicates the sender of the notify
      message does not support receiving packet fragments (i.e., inner
      packets MUST be sent using a single Data Block).  This value only
      applies to what the sender is capable of receiving; the sender MAY
      still send packet fragments unless similarly restricted by the
      receiver in its USE_AGGFRAG notification.

7.  IANA Considerations

7.1.  ESP Next Header Value

   IANA has allocated an IP protocol number from the "Protocol Numbers -
   Assigned Internet Protocol Numbers" registry as follows.

   Decimal:  144
   Keyword:  AGGFRAG
   Protocol:  AGGFRAG encapsulation payload for ESP
   Reference:  RFC 9347

7.2.  AGGFRAG_PAYLOAD Sub-Types

   IANA has created a registry called "AGGFRAG_PAYLOAD Sub-Types" under
   a new category named "ESP AGGFRAG_PAYLOAD".  The registration policy
   for this registry is "Expert Review" [RFC8126] [RFC7120].

   Name:  AGGFRAG_PAYLOAD Sub-Types
   Description:  AGGFRAG_PAYLOAD Payload Formats
   Reference:  RFC 9347

   This initial content for this registry is as follows:

         +==========+===============================+===========+
         | Sub-Type | Name                          | Reference |
         +==========+===============================+===========+
         | 0        | Non-Congestion-Control Format | RFC 9347  |
         +----------+-------------------------------+-----------+
         | 1        | Congestion Control Format     | RFC 9347  |
         +----------+-------------------------------+-----------+
         | 3-255    | Reserved                      |           |
         +----------+-------------------------------+-----------+

                    Table 1: AGGFRAG_PAYLOAD Sub-Types

7.3.  USE_AGGFRAG Notify Message Status Type

   IANA has allocated a status type USE_AGGFRAG from the "IKEv2 Notify
   Message Types - Status Types" registry.

   Decimal:  16442
   Name:  USE_AGGFRAG
   Reference:  RFC 9347

8.  Security Considerations

   This document describes an aggregation and fragmentation mechanism to
   efficiently implement TFC for IP traffic.  This approach is expected
   to reduce the efficacy of traffic analysis on IPsec communication.
   Other than the additional security afforded by using this mechanism,
   IP-TFS utilizes the security protocols [RFC4303] and [RFC7296], and
   so their security considerations apply to IP-TFS as well.

   As noted in Section 3.1, the ECN bits are not protected by IPsec and
   thus may constitute a covert channel.  For this reason, ECN use
   SHOULD NOT be enabled by default.

   As noted previously in Section 2.4.2, for TFC to be maintained, the
   encapsulated traffic flow should not be affecting network congestion
   in a predictable way, and if it would be, then non-congestion-
   controlled mode use should be considered instead.

9.  References

9.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC4303]  Kent, S., "IP Encapsulating Security Payload (ESP)",
              RFC 4303, DOI 10.17487/RFC4303, December 2005,
              <https://www.rfc-editor.org/info/rfc4303>.

   [RFC7296]  Kaufman, C., Hoffman, P., Nir, Y., Eronen, P., and T.
              Kivinen, "Internet Key Exchange Protocol Version 2
              (IKEv2)", STD 79, RFC 7296, DOI 10.17487/RFC7296, October
              2014, <https://www.rfc-editor.org/info/rfc7296>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

9.2.  Informative References

   [AppCrypt] Schneier, B., "Applied Cryptography: Protocols,
              Algorithms, and Source Code in C", 1996.

   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
              DOI 10.17487/RFC0791, September 1981,
              <https://www.rfc-editor.org/info/rfc791>.

   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
              DOI 10.17487/RFC1191, November 1990,
              <https://www.rfc-editor.org/info/rfc1191>.

   [RFC2474]  Nichols, K., Blake, S., Baker, F., and D. Black,
              "Definition of the Differentiated Services Field (DS
              Field) in the IPv4 and IPv6 Headers", RFC 2474,
              DOI 10.17487/RFC2474, December 1998,
              <https://www.rfc-editor.org/info/rfc2474>.

   [RFC2914]  Floyd, S., "Congestion Control Principles", BCP 41,
              RFC 2914, DOI 10.17487/RFC2914, September 2000,
              <https://www.rfc-editor.org/info/rfc2914>.

   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
              of Explicit Congestion Notification (ECN) to IP",
              RFC 3168, DOI 10.17487/RFC3168, September 2001,
              <https://www.rfc-editor.org/info/rfc3168>.

   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
              Internet Protocol", RFC 4301, DOI 10.17487/RFC4301,
              December 2005, <https://www.rfc-editor.org/info/rfc4301>.

   [RFC4342]  Floyd, S., Kohler, E., and J. Padhye, "Profile for
              Datagram Congestion Control Protocol (DCCP) Congestion
              Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342,
              DOI 10.17487/RFC4342, March 2006,
              <https://www.rfc-editor.org/info/rfc4342>.

   [RFC4821]  Mathis, M. and J. Heffner, "Packetization Layer Path MTU
              Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007,
              <https://www.rfc-editor.org/info/rfc4821>.

   [RFC5348]  Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP
              Friendly Rate Control (TFRC): Protocol Specification",
              RFC 5348, DOI 10.17487/RFC5348, September 2008,
              <https://www.rfc-editor.org/info/rfc5348>.

   [RFC6040]  Briscoe, B., "Tunnelling of Explicit Congestion
              Notification", RFC 6040, DOI 10.17487/RFC6040, November
              2010, <https://www.rfc-editor.org/info/rfc6040>.

   [RFC7120]  Cotton, M., "Early IANA Allocation of Standards Track Code
              Points", BCP 100, RFC 7120, DOI 10.17487/RFC7120, January
              2014, <https://www.rfc-editor.org/info/rfc7120>.

   [RFC7510]  Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black,
              "Encapsulating MPLS in UDP", RFC 7510,
              DOI 10.17487/RFC7510, April 2015,
              <https://www.rfc-editor.org/info/rfc7510>.

   [RFC7893]  Stein, Y(J)., Black, D., and B. Briscoe, "Pseudowire
              Congestion Considerations", RFC 7893,
              DOI 10.17487/RFC7893, June 2016,
              <https://www.rfc-editor.org/info/rfc7893>.

   [RFC8084]  Fairhurst, G., "Network Transport Circuit Breakers",
              BCP 208, RFC 8084, DOI 10.17487/RFC8084, March 2017,
              <https://www.rfc-editor.org/info/rfc8084>.

   [RFC8126]  Cotton, M., Leiba, B., and T. Narten, "Guidelines for
              Writing an IANA Considerations Section in RFCs", BCP 26,
              RFC 8126, DOI 10.17487/RFC8126, June 2017,
              <https://www.rfc-editor.org/info/rfc8126>.

   [RFC8200]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
              (IPv6) Specification", STD 86, RFC 8200,
              DOI 10.17487/RFC8200, July 2017,
              <https://www.rfc-editor.org/info/rfc8200>.

   [RFC8201]  McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed.,
              "Path MTU Discovery for IP version 6", STD 87, RFC 8201,
              DOI 10.17487/RFC8201, July 2017,
              <https://www.rfc-editor.org/info/rfc8201>.

   [RFC8546]  Trammell, B. and M. Kuehlewind, "The Wire Image of a
              Network Protocol", RFC 8546, DOI 10.17487/RFC8546, April
              2019, <https://www.rfc-editor.org/info/rfc8546>.

   [RFC8899]  Fairhurst, G., Jones, T., Tüxen, M., Rüngeler, I., and T.
              Völker, "Packetization Layer Path MTU Discovery for
              Datagram Transports", RFC 8899, DOI 10.17487/RFC8899,
              September 2020, <https://www.rfc-editor.org/info/rfc8899>.

   [RFC9329]  Pauly, T. and V. Smyslov, "TCP Encapsulation of Internet
              Key Exchange Protocol (IKE) and IPsec Packets", RFC 9329,
              DOI 10.17487/RFC9329, November 2022,
              <https://www.rfc-editor.org/info/rfc9329>.

   [RFC9348]  Fedyk, D. and C. Hopps, "A YANG Data Model for IP Traffic
              Flow Security", RFC 9348, DOI 10.17487/RFC9348, January
              2023, <https://www.rfc-editor.org/info/rfc9348>.

   [RFC9349]  Fedyk, D. and E. Kinzie, "Definitions of Managed Objects
              for IP Traffic Flow Security", RFC 9349,
              DOI 10.17487/RFC9349, January 2023,
              <https://www.rfc-editor.org/info/rfc9349>.

Appendix A.  Example of an Encapsulated IP Packet Flow

   Below, an example inner IP packet flow within the encapsulating
   tunnel packet stream is shown.  Notice how encapsulated IP packets
   can start and end anywhere, and more than one or less than one may
   occur in a single encapsulating packet.

     Offset: 0        Offset: 100    Offset: 2000    Offset: 600
    [ ESP1  (1404) ][ ESP2  (1404) ][ ESP3  (1404) ][ ESP4  (1404) ]
    [--750--][--750--][60][-240-][--3000----------------------][pad]

                   Figure 11: Inner and Outer Packet Flow

   Each outer encapsulating ESP space is a fixed size of 1404 octets,
   the first 4 octets of which contain the AGGFRAG header.  The
   encapsulated IP packet flow (lengths include the IP header and
   payload) is as follows: a 750-octet packet, a 750-octet packet, a
   60-octet packet, a 240-octet packet, and a 3000-octet packet.

   The BlockOffset values in the 4 AGGFRAG payload headers for this
   packet flow would thus be: 0, 100, 2000, and 600, respectively.  The
   first encapsulating packet (ESP1) has a zero BlockOffset, which
   points at the IP data block immediately following the AGGFRAG header.
   The following packet's (ESP2) BlockOffset points inward 100 octets to
   the start of the 60-octet data block.  The third encapsulating packet
   (ESP3) contains the middle portion of the 3000-octet data block, so
   the offset points past its end and into the fourth encapsulating
   packet.  The fourth packet's (ESP4) offset is 600, pointing at the
   padding that follows the completion of the continued 3000-octet
   packet.

Appendix B.  A Send and Loss Event Rate Calculation

   The current best practice indicates that congestion control SHOULD be
   done in a TCP-friendly way.  A TCP-friendly congestion control
   algorithm is described in [RFC5348].  For this IP-TFS use case (as
   with [RFC4342]), the (fixed) packet size is used as the segment size
   for the algorithm.  The main formula in the algorithm for the send
   rate is then as follows:

                                 1
      X = -----------------------------------------------
          R * (sqrt(2*p/3) + 12*sqrt(3*p/8)*p*(1+32*p^2))

   X is the send rate in packets per second, R is the RTT estimate, and
   p is the loss event rate (the inverse of which is provided by the
   receiver).

   In addition, the algorithm in [RFC5348] also uses an X_recv value
   (the receiver's receive rate).  For IP-TFS, one MAY set this value
   according to the sender's current tunnel send rate (X).

   The IP-TFS receiver, having the RTT estimate from the sender, can use
   the same method as described in [RFC5348] and [RFC4342] to collect
   the loss intervals and calculate the loss event rate value using the
   weighted average as indicated.  The receiver communicates the inverse
   of this value back to the sender in the AGGFRAG_PAYLOAD payload
   header field LossEventRate.

   The IP-TFS sender now has both the R and p values and can calculate
   the correct sending rate.  If following [RFC5348], the sender should
   also use the slow start mechanism described therein when the IP-TFS
   SA is first established.

Appendix C.  Comparisons of IP-TFS

C.1.  Comparing Overhead

   For comparing overhead, the overhead of ESP for both normal and
   AGGFRAG tunnel packets must be calculated, and so an algorithm for
   encryption and authentication must be chosen.  For the data below,
   AES-GCM-256 was selected.  This leads to an IP+ESP overhead of 54.

     54 = 20 (IP) + 8 (ESPH) + 2 (ESPF) + 8 (IV) + 16 (ICV)

   Additionally, for IP-TFS, non-congestion-control AGGFRAG_PAYLOAD
   headers were chosen, which adds 4 octets, for a total overhead of 58.

C.1.1.  IP-TFS Overhead

   For comparison, the overhead of an AGGFRAG payload is 58 octets per
   outer packet.  Therefore, the octet overhead per inner packet is 58
   divided by the number of outer packets required (fractions allowed).
   The overhead as a percentage of inner packet size is a constant based
   on the Outer MTU size.

      OH = 58 / Outer Payload Size / Inner Packet Size
      OH % of Inner Packet Size = 100 * OH / Inner Packet Size
      OH % of Inner Packet Size = 5800 / Outer Payload Size

                   +=======+========+========+========+
                   | Type  | IP-TFS | IP-TFS | IP-TFS |
                   +=======+========+========+========+
                   | MTU   | 576    | 1500   | 9000   |
                   +=======+========+========+========+
                   | PSize | 518    | 1442   | 8942   |
                   +=======+========+========+========+
                   | 40    | 11.20% | 4.02%  | 0.65%  |
                   +-------+--------+--------+--------+
                   | 576   | 11.20% | 4.02%  | 0.65%  |
                   +-------+--------+--------+--------+
                   | 1500  | 11.20% | 4.02%  | 0.65%  |
                   +-------+--------+--------+--------+
                   | 9000  | 11.20% | 4.02%  | 0.65%  |
                   +-------+--------+--------+--------+

                       Table 2: IP-TFS Overhead as
                     Percentage of Inner Packet Size

C.1.2.  ESP with Padding Overhead

   The overhead per inner packet for constant-send-rate-padded ESP
   (i.e., original IPsec TFC) is 36 octets plus any padding, unless
   fragmentation is required.

   When fragmentation of the inner packet is required to fit in the
   outer IPsec packet, overhead is the number of outer packets required
   to carry the fragmented inner packet times both the inner IP Overhead
   (20) and the outer packet overhead (54) minus the initial inner IP
   Overhead plus any required tail padding in the last encapsulation
   packet.  The required tail padding is the number of required packets
   times the difference of the Outer Payload Size and the IP Overhead
   minus the Inner Payload Size.  So:

     Inner Payload Size = IP Packet Size - IP Overhead
     Outer Payload Size = MTU - IPsec Overhead

                   Inner Payload Size
     NF0 = ----------------------------------
            Outer Payload Size - IP Overhead

     NF = CEILING(NF0)

     OH = NF * (IP Overhead + IPsec Overhead)
          - IP Overhead
          + NF * (Outer Payload Size - IP Overhead)
          - Inner Payload Size

     OH = NF * (IPsec Overhead + Outer Payload Size)
          - (IP Overhead + Inner Payload Size)

     OH = NF * (IPsec Overhead + Outer Payload Size)
          - Inner Packet Size

C.2.  Overhead Comparison

   The following tables collect the overhead values for some common L3
   MTU sizes in order to compare them.  The first table is the number of
   octets of overhead for a given L3 MTU-sized packet.  The second table
   is the percentage of overhead in the same MTU-sized packet.

    +========+=========+=========+=========+========+========+========+
    | Type   | ESP+Pad | ESP+Pad | ESP+Pad | IP-TFS | IP-TFS | IP-TFS |
    +========+=========+=========+=========+========+========+========+
    | L3 MTU | 576     | 1500    | 9000    | 576    | 1500   | 9000   |
    +========+=========+=========+=========+========+========+========+
    | PSize  | 522     | 1446    | 8946    | 518    | 1442   | 8942   |
    +========+=========+=========+=========+========+========+========+
    | 40     | 482     | 1406    | 8906    | 4.5    | 1.6    | 0.3    |
    +--------+---------+---------+---------+--------+--------+--------+
    | 128    | 394     | 1318    | 8818    | 14.3   | 5.1    | 0.8    |
    +--------+---------+---------+---------+--------+--------+--------+
    | 256    | 266     | 1190    | 8690    | 28.7   | 10.3   | 1.7    |
    +--------+---------+---------+---------+--------+--------+--------+
    | 518    | 4       | 928     | 8428    | 58.0   | 20.8   | 3.4    |
    +--------+---------+---------+---------+--------+--------+--------+
    | 576    | 576     | 870     | 8370    | 64.5   | 23.2   | 3.7    |
    +--------+---------+---------+---------+--------+--------+--------+
    | 1442   | 286     | 4       | 7504    | 161.5  | 58.0   | 9.4    |
    +--------+---------+---------+---------+--------+--------+--------+
    | 1500   | 228     | 1500    | 7446    | 168.0  | 60.3   | 9.7    |
    +--------+---------+---------+---------+--------+--------+--------+
    | 8942   | 1426    | 1558    | 4       | 1001.2 | 359.7  | 58.0   |
    +--------+---------+---------+---------+--------+--------+--------+
    | 9000   | 1368    | 1500    | 9000    | 1007.7 | 362.0  | 58.4   |
    +--------+---------+---------+---------+--------+--------+--------+

                   Table 3: Overhead Comparison in Octets

    +=======+=========+=========+==========+========+========+========+
    | Type  | ESP+Pad | ESP+Pad | ESP+Pad  | IP-TFS | IP-TFS | IP-TFS |
    +=======+=========+=========+==========+========+========+========+
    | MTU   | 576     | 1500    | 9000     | 576    | 1500   | 9000   |
    +=======+=========+=========+==========+========+========+========+
    | PSize | 522     | 1446    | 8946     | 518    | 1442   | 8942   |
    +=======+=========+=========+==========+========+========+========+
    | 40    | 1205.0% | 3515.0% | 22265.0% | 11.20% | 4.02%  | 0.65%  |
    +-------+---------+---------+----------+--------+--------+--------+
    | 128   | 307.8%  | 1029.7% | 6889.1%  | 11.20% | 4.02%  | 0.65%  |
    +-------+---------+---------+----------+--------+--------+--------+
    | 256   | 103.9%  | 464.8%  | 3394.5%  | 11.20% | 4.02%  | 0.65%  |
    +-------+---------+---------+----------+--------+--------+--------+
    | 518   | 0.8%    | 179.2%  | 1627.0%  | 11.20% | 4.02%  | 0.65%  |
    +-------+---------+---------+----------+--------+--------+--------+
    | 576   | 100.0%  | 151.0%  | 1453.1%  | 11.20% | 4.02%  | 0.65%  |
    +-------+---------+---------+----------+--------+--------+--------+
    | 1442  | 19.8%   | 0.3%    | 520.4%   | 11.20% | 4.02%  | 0.65%  |
    +-------+---------+---------+----------+--------+--------+--------+
    | 1500  | 15.2%   | 100.0%  | 496.4%   | 11.20% | 4.02%  | 0.65%  |
    +-------+---------+---------+----------+--------+--------+--------+
    | 8942  | 15.9%   | 17.4%   | 0.0%     | 11.20% | 4.02%  | 0.65%  |
    +-------+---------+---------+----------+--------+--------+--------+
    | 9000  | 15.2%   | 16.7%   | 100.0%   | 11.20% | 4.02%  | 0.65%  |
    +-------+---------+---------+----------+--------+--------+--------+

            Table 4: Overhead as Percentage of Inner Packet Size

C.3.  Comparing Available Bandwidth

   Another way to compare the two solutions is to look at the amount of
   available bandwidth each solution provides.  The following sections
   consider and compare the percentage of available bandwidth.  For the
   sake of providing a well-understood baseline, normal (unencrypted)
   Ethernet and normal ESP values are included.

C.3.1.  Ethernet

   In order to calculate the available bandwidth, the per-packet
   overhead is calculated first.  The total overhead of Ethernet is 14+4
   octets of header and Cyclic Redundancy Check (CRC) plus an additional
   20 octets of framing (preamble, start, and inter-packet gap), for a
   total of 38 octets.  Additionally, the minimum payload is 46 octets.

   +====+=======+=======+=======+=======+=======+=======+======+======+
   |Size| E + P | E + P | E + P | IPTFS | IPTFS | IPTFS | Enet | ESP  |
   +====+=======+=======+=======+=======+=======+=======+======+======+
   |MTU | 590   | 1514  | 9014  | 590   | 1514  | 9014  | any  | any  |
   +====+=======+=======+=======+=======+=======+=======+======+======+
   |OH  | 92    | 92    | 92    | 96    | 96    | 96    | 38   | 74   |
   +====+=======+=======+=======+=======+=======+=======+======+======+
   |40  | 614   | 1538  | 9038  | 47    | 42    | 40    | 84   | 114  |
   +----+-------+-------+-------+-------+-------+-------+------+------+
   |128 | 614   | 1538  | 9038  | 151   | 136   | 129   | 166  | 202  |
   +----+-------+-------+-------+-------+-------+-------+------+------+
   |256 | 614   | 1538  | 9038  | 303   | 273   | 258   | 294  | 330  |
   +----+-------+-------+-------+-------+-------+-------+------+------+
   |518 | 614   | 1538  | 9038  | 614   | 552   | 523   | 574  | 610  |
   +----+-------+-------+-------+-------+-------+-------+------+------+
   |576 | 1228  | 1538  | 9038  | 682   | 614   | 582   | 614  | 650  |
   +----+-------+-------+-------+-------+-------+-------+------+------+
   |1442| 1842  | 1538  | 9038  | 1709  | 1538  | 1457  | 1498 | 1534 |
   +----+-------+-------+-------+-------+-------+-------+------+------+
   |1500| 1842  | 3076  | 9038  | 1777  | 1599  | 1516  | 1538 | 1574 |
   +----+-------+-------+-------+-------+-------+-------+------+------+
   |8942| 11052 | 10766 | 9038  | 10599 | 9537  | 9038  | 8998 | 9034 |
   +----+-------+-------+-------+-------+-------+-------+------+------+
   |9000| 11052 | 10766 | 18076 | 10667 | 9599  | 9096  | 9038 | 9074 |
   +----+-------+-------+-------+-------+-------+-------+------+------+

                      Table 5: L2 Octets Per Packet

   +====+=======+=======+======+=======+=======+=======+=======+=======+
   |Size| E + P | E +   | E +  | IPTFS | IPTFS | IPTFS | Enet  | ESP   |
   |    |       | P     | P    |       |       |       |       |       |
   +====+=======+=======+======+=======+=======+=======+=======+=======+
   |MTU | 590   | 1514  | 9014 | 590   | 1514  | 9014  | any   | any   |
   +====+=======+=======+======+=======+=======+=======+=======+=======+
   |OH  | 92    | 92    | 92   | 96    | 96    | 96    | 38    | 74    |
   +====+=======+=======+======+=======+=======+=======+=======+=======+
   |40  | 2.0M  | 0.8M  | 0.1M | 26.4M | 29.3M | 30.9M | 14.9M | 11.0M |
   +----+-------+-------+------+-------+-------+-------+-------+-------+
   |128 | 2.0M  | 0.8M  | 0.1M | 8.2M  | 9.2M  | 9.7M  | 7.5M  | 6.2M  |
   +----+-------+-------+------+-------+-------+-------+-------+-------+
   |256 | 2.0M  | 0.8M  | 0.1M | 4.1M  | 4.6M  | 4.8M  | 4.3M  | 3.8M  |
   +----+-------+-------+------+-------+-------+-------+-------+-------+
   |518 | 2.0M  | 0.8M  | 0.1M | 2.0M  | 2.3M  | 2.4M  | 2.2M  | 2.1M  |
   +----+-------+-------+------+-------+-------+-------+-------+-------+
   |576 | 1.0M  | 0.8M  | 0.1M | 1.8M  | 2.0M  | 2.1M  | 2.0M  | 1.9M  |
   +----+-------+-------+------+-------+-------+-------+-------+-------+
   |1442| 678K  | 812K  | 138K | 731K  | 812K  | 857K  | 844K  | 824K  |
   +----+-------+-------+------+-------+-------+-------+-------+-------+
   |1500| 678K  | 406K  | 138K | 703K  | 781K  | 824K  | 812K  | 794K  |
   +----+-------+-------+------+-------+-------+-------+-------+-------+
   |8942| 113K  | 116K  | 138K | 117K  | 131K  | 138K  | 139K  | 138K  |
   +----+-------+-------+------+-------+-------+-------+-------+-------+
   |9000| 113K  | 116K  | 69K  | 117K  | 130K  | 137K  | 138K  | 137K  |
   +----+-------+-------+------+-------+-------+-------+-------+-------+

                Table 6: Packets Per Second on 10G Ethernet

   +====+======+======+======+======+======+========+========+========+
   |Size|E + P |E + P |E + P |IP-TFS|IP-TFS| IP-TFS | Enet   | ESP    |
   +====+======+======+======+======+======+========+========+========+
   |MTU |590   |1514  |9014  |590   |1514  | 9014   | any    | any    |
   +====+======+======+======+======+======+========+========+========+
   |OH  |92    |92    |92    |96    |96    | 96     | 38     | 74     |
   +====+======+======+======+======+======+========+========+========+
   |40  |6.51% |2.60% |0.44% |84.36%|93.76%| 98.94% | 47.62% | 35.09% |
   +----+------+------+------+------+------+--------+--------+--------+
   |128 |20.85%|8.32% |1.42% |84.36%|93.76%| 98.94% | 77.11% | 63.37% |
   +----+------+------+------+------+------+--------+--------+--------+
   |256 |41.69%|16.64%|2.83% |84.36%|93.76%| 98.94% | 87.07% | 77.58% |
   +----+------+------+------+------+------+--------+--------+--------+
   |518 |84.36%|33.68%|5.73% |84.36%|93.76%| 98.94% | 93.17% | 87.50% |
   +----+------+------+------+------+------+--------+--------+--------+
   |576 |46.91%|37.45%|6.37% |84.36%|93.76%| 98.94% | 93.81% | 88.62% |
   +----+------+------+------+------+------+--------+--------+--------+
   |1442|78.28%|93.76%|15.95%|84.36%|93.76%| 98.94% | 97.43% | 95.12% |
   +----+------+------+------+------+------+--------+--------+--------+
   |1500|81.43%|48.76%|16.60%|84.36%|93.76%| 98.94% | 97.53% | 95.30% |
   +----+------+------+------+------+------+--------+--------+--------+
   |8942|80.91%|83.06%|98.94%|84.36%|93.76%| 98.94% | 99.58% | 99.18% |
   +----+------+------+------+------+------+--------+--------+--------+
   |9000|81.43%|83.60%|49.79%|84.36%|93.76%| 98.94% | 99.58% | 99.18% |
   +----+------+------+------+------+------+--------+--------+--------+

             Table 7: Percentage of Bandwidth on 10G Ethernet

   A sometimes unexpected result of using an AGGFRAG tunnel (or any
   packet aggregating tunnel) is that, for small- to medium-sized
   packets, the available bandwidth is actually greater than plain
   Ethernet.  This is due to the reduction in Ethernet framing overhead.
   This increased bandwidth is paid for with an increase in latency.
   This latency is the time to send the unrelated octets in the outer
   tunnel frame.  The following table illustrates the latency for some
   common values on a 10G Ethernet link.  The table also includes
   latency introduced by padding if using ESP with padding.

             +======+=========+=========+=========+=========+
             | Size | ESP+Pad | ESP+Pad | IP-TFS  | IP-TFS  |
             +======+=========+=========+=========+=========+
             | MTU  | 1500    | 9000    | 1500    | 9000    |
             +======+=========+=========+=========+=========+
             | 40   | 1.12 us | 7.12 us | 1.17 us | 7.17 us |
             +------+---------+---------+---------+---------+
             | 128  | 1.05 us | 7.05 us | 1.10 us | 7.10 us |
             +------+---------+---------+---------+---------+
             | 256  | 0.95 us | 6.95 us | 1.00 us | 7.00 us |
             +------+---------+---------+---------+---------+
             | 518  | 0.74 us | 6.74 us | 0.79 us | 6.79 us |
             +------+---------+---------+---------+---------+
             | 576  | 0.70 us | 6.70 us | 0.74 us | 6.74 us |
             +------+---------+---------+---------+---------+
             | 1442 | 0.00 us | 6.00 us | 0.05 us | 6.05 us |
             +------+---------+---------+---------+---------+
             | 1500 | 1.20 us | 5.96 us | 0.00 us | 6.00 us |
             +------+---------+---------+---------+---------+

                          Table 8: Added Latency

   Notice that the latency values are very similar between the two
   solutions; however, whereas IP-TFS provides for constant high
   bandwidth, in some cases even exceeding plain Ethernet, ESP with
   padding often greatly reduces available bandwidth.

Acknowledgements

   We would like to thank Don Fedyk for help in reviewing and editing
   this work.  We would also like to thank Michael Richardson, Sean
   Turner, Valery Smyslov, and Tero Kivinen for reviews and many
   suggestions for improvements, as well as Joseph Touch for the
   transport area review and suggested improvements.

Contributors

   The following person made significant contributions to this document.

   Lou Berger
   LabN Consulting, L.L.C.
   Email: lberger@labn.net


Author's Address

   Christian Hopps
   LabN Consulting, L.L.C.
   Email: chopps@chopps.org