1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
|
Internet Engineering Task Force (IETF) T. Talpey
Request for Comments: 5666 Unaffiliated
Category: Standards Track B. Callaghan
ISSN: 2070-1721 Apple
January 2010
Remote Direct Memory Access Transport for Remote Procedure Call
Abstract
This document describes a protocol providing Remote Direct Memory
Access (RDMA) as a new transport for Remote Procedure Call (RPC).
The RDMA transport binding conveys the benefits of efficient, bulk-
data transport over high-speed networks, while providing for minimal
change to RPC applications and with no required revision of the
application RPC protocol, or the RPC protocol itself.
Status of This Memo
This is an Internet Standards Track document.
This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 5741.
Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
http://www.rfc-editor.org/info/rfc5666.
Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Talpey & Callaghan Standards Track [Page 1]
^L
RFC 5666 RDMA Transport for RPC January 2010
This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow
modifications of such material outside the IETF Standards Process.
Without obtaining an adequate license from the person(s) controlling
the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other
than English.
Table of Contents
1. Introduction ....................................................3
1.1. Requirements Language ......................................4
2. Abstract RDMA Requirements ......................................4
3. Protocol Outline ................................................5
3.1. Short Messages .............................................6
3.2. Data Chunks ................................................6
3.3. Flow Control ...............................................7
3.4. XDR Encoding with Chunks ...................................8
3.5. XDR Decoding with Read Chunks .............................11
3.6. XDR Decoding with Write Chunks ............................12
3.7. XDR Roundup and Chunks ....................................13
3.8. RPC Call and Reply ........................................14
3.9. Padding ...................................................17
4. RPC RDMA Message Layout ........................................18
4.1. RPC-over-RDMA Header ......................................18
4.2. RPC-over-RDMA Header Errors ...............................20
4.3. XDR Language Description ..................................20
5. Long Messages ..................................................22
5.1. Message as an RDMA Read Chunk .............................23
5.2. RDMA Write of Long Replies (Reply Chunks) .................24
6. Connection Configuration Protocol ..............................25
6.1. Initial Connection State ..................................26
6.2. Protocol Description ......................................26
7. Memory Registration Overhead ...................................28
8. Errors and Error Recovery ......................................28
9. Node Addressing ................................................28
10. RPC Binding ...................................................29
11. Security Considerations .......................................30
12. IANA Considerations ...........................................31
13. Acknowledgments ...............................................32
14. References ....................................................33
14.1. Normative References .....................................33
14.2. Informative References ...................................33
Talpey & Callaghan Standards Track [Page 2]
^L
RFC 5666 RDMA Transport for RPC January 2010
1. Introduction
Remote Direct Memory Access (RDMA) [RFC5040, RFC5041], [IB] is a
technique for efficient movement of data between end nodes, which
becomes increasingly compelling over high-speed transports. By
directing data into destination buffers as it is sent on a network,
and placing it via direct memory access by hardware, the double
benefit of faster transfers and reduced host overhead is obtained.
Open Network Computing Remote Procedure Call (ONC RPC, or simply,
RPC) [RFC5531] is a remote procedure call protocol that has been run
over a variety of transports. Most RPC implementations today use UDP
or TCP. RPC messages are defined in terms of an eXternal Data
Representation (XDR) [RFC4506], which provides a canonical data
representation across a variety of host architectures. An XDR data
stream is conveyed differently on each type of transport. On UDP,
RPC messages are encapsulated inside datagrams, while on a TCP byte
stream, RPC messages are delineated by a record marking protocol. An
RDMA transport also conveys RPC messages in a unique fashion that
must be fully described if client and server implementations are to
interoperate.
RDMA transports present new semantics unlike the behaviors of either
UDP or TCP alone. They retain message delineations like UDP while
also providing a reliable, sequenced data transfer like TCP. Also,
they provide the new efficient, bulk-transfer service of RDMA. RDMA
transports are therefore naturally viewed as a new transport type by
RPC.
RDMA as a transport will benefit the performance of RPC protocols
that move large "chunks" of data, since RDMA hardware excels at
moving data efficiently between host memory and a high-speed network
with little or no host CPU involvement. In this context, the Network
File System (NFS) protocol, in all its versions [RFC1094] [RFC1813]
[RFC3530] [RFC5661], is an obvious beneficiary of RDMA. A complete
problem statement is discussed in [RFC5532], and related NFSv4 issues
are discussed in [RFC5661]. Many other RPC-based protocols will also
benefit.
Although the RDMA transport described here provides relatively
transparent support for any RPC application, the proposal goes
further in describing mechanisms that can optimize the use of RDMA
with more active participation by the RPC application.
Talpey & Callaghan Standards Track [Page 3]
^L
RFC 5666 RDMA Transport for RPC January 2010
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
2. Abstract RDMA Requirements
An RPC transport is responsible for conveying an RPC message from a
sender to a receiver. An RPC message is either an RPC call from a
client to a server, or an RPC reply from the server back to the
client. An RPC message contains an RPC call header followed by
arguments if the message is an RPC call, or an RPC reply header
followed by results if the message is an RPC reply. The call header
contains a transaction ID (XID) followed by the program and procedure
number as well as a security credential. An RPC reply header begins
with an XID that matches that of the RPC call message, followed by a
security verifier and results. All data in an RPC message is XDR
encoded. For a complete description of the RPC protocol and XDR
encoding, see [RFC5531] and [RFC4506].
This protocol assumes the following abstract model for RDMA
transports. These terms, common in the RDMA lexicon, are used in
this document. A more complete glossary of RDMA terms can be found
in [RFC5040].
o Registered Memory
All data moved via tagged RDMA operations is resident in
registered memory at its destination. This protocol assumes
that each segment of registered memory MUST be identified with a
steering tag of no more than 32 bits and memory addresses of up
to 64 bits in length.
o RDMA Send
The RDMA provider supports an RDMA Send operation with
completion signaled at the receiver when data is placed in a
pre-posted buffer. The amount of transferred data is limited
only by the size of the receiver's buffer. Sends complete at
the receiver in the order they were issued at the sender.
o RDMA Write
The RDMA provider supports an RDMA Write operation to directly
place data in the receiver's buffer. An RDMA Write is initiated
by the sender and completion is signaled at the sender. No
completion is signaled at the receiver. The sender uses a
steering tag, memory address, and length of the remote
destination buffer. RDMA Writes are not necessarily ordered
with respect to one another, but are ordered with respect to
Talpey & Callaghan Standards Track [Page 4]
^L
RFC 5666 RDMA Transport for RPC January 2010
RDMA Sends; a subsequent RDMA Send completion obtained at the
receiver guarantees that prior RDMA Write data has been
successfully placed in the receiver's memory.
o RDMA Read
The RDMA provider supports an RDMA Read operation to directly
place peer source data in the requester's buffer. An RDMA Read
is initiated by the receiver and completion is signaled at the
receiver. The receiver provides steering tags, memory
addresses, and a length for the remote source and local
destination buffers. Since the peer at the data source receives
no notification of RDMA Read completion, there is an assumption
that on receiving the data, the receiver will signal completion
with an RDMA Send message, so that the peer can free the source
buffers and the associated steering tags.
This protocol is designed to be carried over all RDMA transports
meeting the stated requirements. This protocol conveys to the RPC
peer information sufficient for that RPC peer to direct an RDMA layer
to perform transfers containing RPC data and to communicate their
result(s). For example, it is readily carried over RDMA transports
such as Internet Wide Area RDMA Protocol (iWARP) [RFC5040, RFC5041],
or InfiniBand [IB].
3. Protocol Outline
An RPC message can be conveyed in identical fashion, whether it is a
call or reply message. In each case, the transmission of the message
proper is preceded by transmission of a transport-specific header for
use by RPC-over-RDMA transports. This header is analogous to the
record marking used for RPC over TCP, but is more extensive, since
RDMA transports support several modes of data transfer; it is
important to allow the upper-layer protocol to specify the most
efficient mode for each of the segments in a message. Multiple
segments of a message may thereby be transferred in different ways to
different remote memory destinations.
All transfers of a call or reply begin with an RDMA Send that
transfers at least the RPC-over-RDMA header, usually with the call or
reply message appended, or at least some part thereof. Because the
size of what may be transmitted via RDMA Send is limited by the size
of the receiver's pre-posted buffer, the RPC-over-RDMA transport
provides a number of methods to reduce the amount transferred by
means of the RDMA Send, when necessary, by transferring various parts
of the message using RDMA Read and RDMA Write.
Talpey & Callaghan Standards Track [Page 5]
^L
RFC 5666 RDMA Transport for RPC January 2010
RPC-over-RDMA framing replaces all other RPC framing (such as TCP
record marking) when used atop an RPC/RDMA association, even though
the underlying RDMA protocol may itself be layered atop a protocol
with a defined RPC framing (such as TCP). It is however possible for
RPC/RDMA to be dynamically enabled, in the course of negotiating the
use of RDMA via an upper-layer exchange. Because RPC framing
delimits an entire RPC request or reply, the resulting shift in
framing must occur between distinct RPC messages, and in concert with
the transport.
3.1. Short Messages
Many RPC messages are quite short. For example, the NFS version 3
GETATTR request, is only 56 bytes: 20 bytes of RPC header, plus a
32-byte file handle argument and 4 bytes of length. The reply to
this common request is about 100 bytes.
There is no benefit in transferring such small messages with an RDMA
Read or Write operation. The overhead in transferring steering tags
and memory addresses is justified only by large transfers. The
critical message size that justifies RDMA transfer will vary
depending on the RDMA implementation and network, but is typically of
the order of a few kilobytes. It is appropriate to transfer a short
message with an RDMA Send to a pre-posted buffer. The RPC-over-RDMA
header with the short message (call or reply) immediately following
is transferred using a single RDMA Send operation.
Short RPC messages over an RDMA transport:
RPC Client RPC Server
| RPC Call |
Send | ------------------------------> |
| |
| RPC Reply |
| <------------------------------ | Send
3.2. Data Chunks
Some protocols, like NFS, have RPC procedures that can transfer very
large chunks of data in the RPC call or reply and would cause the
maximum send size to be exceeded if one tried to transfer them as
part of the RDMA Send. These large chunks typically range from a
kilobyte to a megabyte or more. An RDMA transport can transfer large
chunks of data more efficiently via the direct placement of an RDMA
Read or RDMA Write operation. Using direct placement instead of
inline transfer not only avoids expensive data copies, but provides
correct data alignment at the destination.
Talpey & Callaghan Standards Track [Page 6]
^L
RFC 5666 RDMA Transport for RPC January 2010
3.3. Flow Control
It is critical to provide RDMA Send flow control for an RDMA
connection. RDMA receive operations will fail if a pre-posted
receive buffer is not available to accept an incoming RDMA Send, and
repeated occurrences of such errors can be fatal to the connection.
This is a departure from conventional TCP/IP networking where buffers
are allocated dynamically on an as-needed basis, and where
pre-posting is not required.
It is not practical to provide for fixed credit limits at the RPC
server. Fixed limits scale poorly, since posted buffers are
dedicated to the associated connection until consumed by receive
operations. Additionally, for protocol correctness, the RPC server
must always be able to reply to client requests, whether or not new
buffers have been posted to accept future receives. (Note that the
RPC server may in fact be a client at some other layer. For example,
NFSv4 callbacks are processed by the NFSv4 client, acting as an RPC
server. The credit discussions apply equally in either case.)
Flow control for RDMA Send operations is implemented as a simple
request/grant protocol in the RPC-over-RDMA header associated with
each RPC message. The RPC-over-RDMA header for RPC call messages
contains a requested credit value for the RPC server, which MAY be
dynamically adjusted by the caller to match its expected needs. The
RPC-over-RDMA header for the RPC reply messages provides the granted
result, which MAY have any value except it MUST NOT be zero when no
in-progress operations are present at the server, since such a value
would result in deadlock. The value MAY be adjusted up or down at
each opportunity to match the server's needs or policies.
The RPC client MUST NOT send unacknowledged requests in excess of
this granted RPC server credit limit. If the limit is exceeded, the
RDMA layer may signal an error, possibly terminating the connection.
Even if an error does not occur, it is OPTIONAL that the server
handle the excess request(s), and it MAY return an RPC error to the
client. Also note that the never-zero requirement implies that an
RPC server MUST always provide at least one credit to each connected
RPC client from which no requests are outstanding. The client would
deadlock otherwise, unable to send another request.
While RPC calls complete in any order, the current flow control limit
at the RPC server is known to the RPC client from the Send ordering
properties. It is always the most recent server-granted credit value
minus the number of requests in flight.
Talpey & Callaghan Standards Track [Page 7]
^L
RFC 5666 RDMA Transport for RPC January 2010
Certain RDMA implementations may impose additional flow control
restrictions, such as limits on RDMA Read operations in progress at
the responder. Because these operations are outside the scope of
this protocol, they are not addressed and SHOULD be provided for by
other layers. For example, a simple upper-layer RPC consumer might
perform single-issue RDMA Read requests, while a more sophisticated,
multithreaded RPC consumer might implement its own First In, First
Out (FIFO) queue of such operations. For further discussion of
possible protocol implementations capable of negotiating these
values, see Section 6 "Connection Configuration Protocol" of this
document, or [RFC5661].
3.4. XDR Encoding with Chunks
The data comprising an RPC call or reply message is marshaled or
serialized into a contiguous stream by an XDR routine. XDR data
types such as integers, strings, arrays, and linked lists are
commonly implemented over two very simple functions that encode
either an XDR data unit (32 bits) or an array of bytes.
Normally, the separate data items in an RPC call or reply are encoded
as a contiguous sequence of bytes for network transmission over UDP
or TCP. However, in the case of an RDMA transport, local routines
such as XDR encode can determine that (for instance) an opaque byte
array is large enough to be more efficiently moved via an RDMA data
transfer operation like RDMA Read or RDMA Write.
Semantically speaking, the protocol has no restriction regarding data
types that may or may not be represented by a read or write chunk.
In practice however, efficiency considerations lead to the conclusion
that certain data types are not generally "chunkable". Typically,
only those opaque and aggregate data types that may attain
substantial size are considered to be eligible. With today's
hardware, this size may be a kilobyte or more. However, any object
MAY be chosen for chunking in any given message.
The eligibility of XDR data items to be candidates for being moved as
data chunks (as opposed to being marshaled inline) is not specified
by the RPC-over-RDMA protocol. Chunk eligibility criteria MUST be
determined by each upper-layer in order to provide for an
interoperable specification. One such example with rationale, for
the NFS protocol family, is provided in [RFC5667].
The interface by which an upper-layer implementation communicates the
eligibility of a data item locally to RPC for chunking is out of
scope for this specification. In many implementations, it is
possible to implement a transparent RPC chunking facility. However,
such implementations may lead to inefficiencies, either because they
Talpey & Callaghan Standards Track [Page 8]
^L
RFC 5666 RDMA Transport for RPC January 2010
require the RPC layer to perform expensive registration and
de-registration of memory "on the fly", or they may require using
RDMA chunks in reply messages, along with the resulting additional
handshaking with the RPC-over-RDMA peer. However, these issues are
internal and generally confined to the local interface between RPC
and its upper layers, one in which implementations are free to
innovate. The only requirement is that the resulting RPC RDMA
protocol sent to the peer is valid for the upper layer. See, for
example, [RFC5667].
When sending any message (request or reply) that contains an eligible
large data chunk, the XDR encoding routine avoids moving the data
into the XDR stream. Instead, it does not encode the data portion,
but records the address and size of each chunk in a separate "read
chunk list" encoded within RPC RDMA transport-specific headers. Such
chunks will be transferred via RDMA Read operations initiated by the
receiver.
When the read chunks are to be moved via RDMA, the memory for each
chunk is registered. This registration may take place within XDR
itself, providing for full transparency to upper layers, or it may be
performed by any other specific local implementation.
Additionally, when making an RPC call that can result in bulk data
transferred in the reply, write chunks MAY be provided to accept the
data directly via RDMA Write. These write chunks will therefore be
pre-filled by the RPC server prior to responding, and XDR decode of
the data at the client will not be required. These chunks undergo a
similar registration and advertisement via "write chunk lists" built
as a part of XDR encoding.
Some RPC client implementations are not able to determine where an
RPC call's results reside during the "encode" phase. This makes it
difficult or impossible for the RPC client layer to encode the write
chunk list at the time of building the request. In this case, it is
difficult for the RPC implementation to provide transparency to the
RPC consumer, which may require recoding to provide result
information at this earlier stage.
Therefore, if the RPC client does not make a write chunk list
available to receive the result, then the RPC server MAY return data
inline in the reply, or if the upper-layer specification permits, it
MAY be returned via a read chunk list. It is NOT RECOMMENDED that
upper-layer RPC client protocol specifications omit write chunk lists
for eligible replies, due to the lower performance of the additional
handshaking to perform data transfer, and the requirement that the
RPC server must expose (and preserve) the reply data for a period of
Talpey & Callaghan Standards Track [Page 9]
^L
RFC 5666 RDMA Transport for RPC January 2010
time. In the absence of a server-provided read chunk list in the
reply, if the encoded reply overflows the posted receive buffer, the
RPC will fail with an RDMA transport error.
When any data within a message is provided via either read or write
chunks, the chunk itself refers only to the data portion of the XDR
stream element. In particular, for counted fields (e.g., a "<>"
encoding) the byte count that is encoded as part of the field remains
in the XDR stream, and is also encoded in the chunk list. The data
portion is however elided from the encoded XDR stream, and is
transferred as part of chunk list processing. It is important to
maintain upper-layer implementation compatibility -- both the count
and the data must be transferred as part of the logical XDR stream.
While the chunk list processing results in the data being available
to the upper-layer peer for XDR decoding, the length present in the
chunk list entries is not. Any byte count in the XDR stream MUST
match the sum of the byte counts present in the corresponding read or
write chunk list. If they do not agree, an RPC protocol encoding
error results.
The following items are contained in a chunk list entry.
Handle
Steering tag or handle obtained when the chunk memory is
registered for RDMA.
Length
The length of the chunk in bytes.
Offset
The offset or beginning memory address of the chunk. In order
to support the widest array of RDMA implementations, as well as
the most general steering tag scheme, this field is
unconditionally included in each chunk list entry.
While zero-based offset schemes are available in many RDMA
implementations, their use by RPC requires individual
registration of each read or write chunk. On many such
implementations, this can be a significant overhead. By
providing an offset in each chunk, many pre-registration or
region-based registrations can be readily supported, and by
using a single, universal chunk representation, the RPC RDMA
protocol implementation is simplified to its most general form.
Position
For data that is to be encoded, the position in the XDR stream
where the chunk would normally reside. Note that the chunk
therefore inserts its data into the XDR stream at this position,
Talpey & Callaghan Standards Track [Page 10]
^L
RFC 5666 RDMA Transport for RPC January 2010
but its transfer is no longer "inline". Also note therefore
that all chunks belonging to a single RPC argument or result
will have the same position. For data that is to be decoded, no
position is used.
When XDR marshaling is complete, the chunk list is XDR encoded, then
sent to the receiver prepended to the RPC message. Any source data
for a read chunk, or the destination of a write chunk, remain behind
in the sender's registered memory, and their actual payload is not
marshaled into the request or reply.
+----------------+----------------+-------------
| RPC-over-RDMA | |
| header w/ | RPC Header | Non-chunk args/results
| chunks | |
+----------------+----------------+-------------
Read chunk lists and write chunk lists are structured somewhat
differently. This is due to the different usage -- read chunks are
decoded and indexed by their argument's or result's position in the
XDR data stream; their size is always known. Write chunks, on the
other hand, are used only for results, and have neither a preassigned
offset in the XDR stream nor a size until the results are produced,
since the buffers may be only partially filled, or may not be used
for results at all. Their presence in the XDR stream is therefore
not known until the reply is processed. The mapping of write chunks
onto designated NFS procedures and their results is described in
[RFC5667].
Therefore, read chunks are encoded into a read chunk list as a single
array, with each entry tagged by its (known) size and its argument's
or result's position in the XDR stream. Write chunks are encoded as
a list of arrays of RDMA buffers, with each list element (an array)
providing buffers for a separate result. Individual write chunk list
elements MAY thereby result in being partially or fully filled, or in
fact not being filled at all. Unused write chunks, or unused bytes
in write chunk buffer lists, are not returned as results, and their
memory is returned to the upper layer as part of RPC completion.
However, the RPC layer MUST NOT assume that the buffers have not been
modified.
3.5. XDR Decoding with Read Chunks
The XDR decode process moves data from an XDR stream into a data
structure provided by the RPC client or server application. Where
elements of the destination data structure are buffers or strings,
the RPC application can either pre-allocate storage to receive the
Talpey & Callaghan Standards Track [Page 11]
^L
RFC 5666 RDMA Transport for RPC January 2010
data or leave the string or buffer fields null and allow the XDR
decode stage of RPC processing to automatically allocate storage of
sufficient size.
When decoding a message from an RDMA transport, the receiver first
XDR decodes the chunk lists from the RPC-over-RDMA header, then
proceeds to decode the body of the RPC message (arguments or
results). Whenever the XDR offset in the decode stream matches that
of a chunk in the read chunk list, the XDR routine initiates an RDMA
Read to bring over the chunk data into locally registered memory for
the destination buffer.
When processing an RPC request, the RPC receiver (RPC server)
acknowledges its completion of use of the source buffers by simply
replying to the RPC sender (client), and the peer may then free all
source buffers advertised by the request.
When processing an RPC reply, after completing such a transfer, the
RPC receiver (client) MUST issue an RDMA_DONE message (described in
Section 3.8) to notify the peer (server) that the source buffers can
be freed.
The read chunk list is constructed and used entirely within the
RPC/XDR layer. Other than specifying the minimum chunk size, the
management of the read chunk list is automatic and transparent to an
RPC application.
3.6. XDR Decoding with Write Chunks
When a write chunk list is provided for the results of the RPC call,
the RPC server MUST provide any corresponding data via RDMA Write to
the memory referenced in the chunk list entries. The RPC reply
conveys this by returning the write chunk list to the client with the
lengths rewritten to match the actual transfer. The XDR decode of
the reply therefore performs no local data transfer but merely
returns the length obtained from the reply.
Each decoded result consumes one entry in the write chunk list, which
in turn consists of an array of RDMA segments. The length is
therefore the sum of all returned lengths in all segments comprising
the corresponding list entry. As each list entry is decoded, the
entire entry is consumed.
The write chunk list is constructed and used by the RPC application.
The RPC/XDR layer simply conveys the list between client and server
and initiates the RDMA Writes back to the client. The mapping of
Talpey & Callaghan Standards Track [Page 12]
^L
RFC 5666 RDMA Transport for RPC January 2010
write chunk list entries to procedure arguments MUST be determined
for each protocol. An example of a mapping is described in
[RFC5667].
3.7. XDR Roundup and Chunks
The XDR protocol requires 4-byte alignment of each new encoded
element in any XDR stream. This requirement is for efficiency and
ease of decode/unmarshaling at the receiver -- if the XDR stream
buffer begins on a native machine boundary, then the XDR elements
will lie on similarly predictable offsets in memory.
Within XDR, when non-4-byte encodes (such as an odd-length string or
bulk data) are marshaled, their length is encoded literally, while
their data is padded to begin the next element at a 4-byte boundary
in the XDR stream. For TCP or RDMA inline encoding, this minimal
overhead is required because the transport-specific framing relies on
the fact that the relative offset of the elements in the XDR stream
from the start of the message determines the XDR position during
decode.
On the other hand, RPC/RDMA Read chunks carry the XDR position of
each chunked element and length of the Chunk segment, and can be
placed by the receiver exactly where they belong in the receiver's
memory without regard to the alignment of their position in the XDR
stream. Since any rounded-up data is not actually part of the upper
layer's message, the receiver will not reference it, and there is no
reason to set it to any particular value in the receiver's memory.
When roundup is present at the end of a sequence of chunks, the
length of the sequence will terminate it at a non-4-byte XDR
position. When the receiver proceeds to decode the remaining part of
the XDR stream, it inspects the XDR position indicated by the next
chunk. Because this position will not match (else roundup would not
have occurred), the receiver decoding will fall back to inspecting
the remaining inline portion. If in turn, no data remains to be
decoded from the inline portion, then the receiver MUST conclude that
roundup is present, and therefore it advances the XDR decode position
to that indicated by the next chunk (if any). In this way, roundup
is passed without ever actually transferring additional XDR bytes.
Some protocol operations over RPC/RDMA, for instance NFS writes of
data encountered at the end of a file or in direct I/O situations,
commonly yield these roundups within RDMA Read Chunks. Because any
roundup bytes are not actually present in the data buffers being
written, memory for these bytes would come from noncontiguous
buffers, either as an additional memory registration segment or as an
additional Chunk. The overhead of these operations can be
Talpey & Callaghan Standards Track [Page 13]
^L
RFC 5666 RDMA Transport for RPC January 2010
significant to both the sender to marshal them and even higher to the
receiver to which to transfer them. Senders SHOULD therefore avoid
encoding individual RDMA Read Chunks for roundup whenever possible.
It is acceptable, but not necessary, to include roundup data in an
existing RDMA Read Chunk, but only if it is already present in the
XDR stream to carry upper-layer data.
Note that there is no exposure of additional data at the sender due
to eliding roundup data from the XDR stream, since any additional
sender buffers are never exposed to the peer. The data is literally
not there to be transferred.
For RDMA Write Chunks, a simpler encoding method applies. Again,
roundup bytes are not transferred, instead the chunk length sent to
the receiver in the reply is simply increased to include any roundup.
Because of the requirement that the RDMA Write Chunks are filled
sequentially without gaps, this situation can only occur on the final
chunk receiving data. Therefore, there is no opportunity for roundup
data to insert misalignment or positional gaps into the XDR stream.
3.8. RPC Call and Reply
The RDMA transport for RPC provides three methods of moving data
between RPC client and server:
Inline
Data is moved between RPC client and server within an RDMA Send.
RDMA Read
Data is moved between RPC client and server via an RDMA Read
operation via steering tag; address and offset obtained from a
read chunk list.
RDMA Write
Result data is moved from RPC server to client via an RDMA Write
operation via steering tag; address and offset obtained from a
write chunk list or reply chunk in the client's RPC call
message.
These methods of data movement may occur in combinations within a
single RPC. For instance, an RPC call may contain some inline data
along with some large chunks to be transferred via RDMA Read to the
server. The reply to that call may have some result chunks that the
server RDMA Writes back to the client. The following protocol
interactions illustrate RPC calls that use these methods to move RPC
message data:
Talpey & Callaghan Standards Track [Page 14]
^L
RFC 5666 RDMA Transport for RPC January 2010
An RPC with write chunks in the call message:
RPC Client RPC Server
| RPC Call + Write Chunk list |
Send | ------------------------------> |
| |
| Chunk 1 |
| <------------------------------ | Write
| : |
| Chunk n |
| <------------------------------ | Write
| |
| RPC Reply |
| <------------------------------ | Send
In the presence of write chunks, RDMA ordering provides the guarantee
that all data in the RDMA Write operations has been placed in memory
prior to the client's RPC reply processing.
An RPC with read chunks in the call message:
RPC Client RPC Server
| RPC Call + Read Chunk list |
Send | ------------------------------> |
| |
| Chunk 1 |
| +------------------------------ | Read
| v-----------------------------> |
| : |
| Chunk n |
| +------------------------------ | Read
| v-----------------------------> |
| |
| RPC Reply |
| <------------------------------ | Send
Talpey & Callaghan Standards Track [Page 15]
^L
RFC 5666 RDMA Transport for RPC January 2010
An RPC with read chunks in the reply message:
RPC Client RPC Server
| RPC Call |
Send | ------------------------------> |
| |
| RPC Reply + Read Chunk list |
| <------------------------------ | Send
| |
| Chunk 1 |
Read | ------------------------------+ |
| <-----------------------------v |
| : |
| Chunk n |
Read | ------------------------------+ |
| <-----------------------------v |
| |
| Done |
Send | ------------------------------> |
The final Done message allows the RPC client to signal the server
that it has received the chunks, so the server can de-register and
free the memory holding the chunks. A Done completion is not
necessary for an RPC call, since the RPC reply Send is itself a
receive completion notification. In the event that the client fails
to return the Done message within some timeout period, the server MAY
conclude that a protocol violation has occurred and close the RPC
connection, or it MAY proceed with a de-register and free its chunk
buffers. This may result in a fatal RDMA error if the client later
attempts to perform an RDMA Read operation, which amounts to the same
thing.
The use of read chunks in RPC reply messages is much less efficient
than providing write chunks in the originating RPC calls, due to the
additional message exchanges, the need for the RPC server to
advertise buffers to the peer, the necessity of the server
maintaining a timer for the purpose of recovery from misbehaving
clients, and the need for additional memory registration. Their use
is NOT RECOMMENDED by upper layers where efficiency is a primary
concern [RFC5667]. However, they MAY be employed by upper-layer
protocol bindings that are primarily concerned with transparency,
since they can frequently be implemented completely within the RPC
lower layers.
It is important to note that the Done message consumes a credit at
the RPC server. The RPC server SHOULD provide sufficient credits to
the client to allow the Done message to be sent without deadlock
(driving the outstanding credit count to zero). The RPC client MUST
Talpey & Callaghan Standards Track [Page 16]
^L
RFC 5666 RDMA Transport for RPC January 2010
account for its required Done messages to the server in its
accounting of available credits, and the server SHOULD replenish any
credit consumed by its use of such exchanges at its earliest
opportunity.
Finally, it is possible to conceive of RPC exchanges that involve any
or all combinations of write chunks in the RPC call, read chunks in
the RPC call, and read chunks in the RPC reply. Support for such
exchanges is straightforward from a protocol perspective, but in
practice such exchanges would be quite rare, limited to upper-layer
protocol exchanges that transferred bulk data in both the call and
corresponding reply.
3.9. Padding
Alignment of specific opaque data enables certain scatter/gather
optimizations. Padding leverages the useful property that RDMA
transfers preserve alignment of data, even when they are placed into
pre-posted receive buffers by Sends.
Many servers can make good use of such padding. Padding allows the
chaining of RDMA receive buffers such that any data transferred by
RDMA on behalf of RPC requests will be placed into appropriately
aligned buffers on the system that receives the transfer. In this
way, the need for servers to perform RDMA Read to satisfy all but the
largest client writes is obviated.
The effect of padding is demonstrated below showing prior bytes on an
XDR stream ("XXX" in the figure below) followed by an opaque field
consisting of four length bytes ("LLLL") followed by data bytes
("DDD"). The receiver of the RDMA Send has posted two chained
receive buffers. Without padding, the opaque data is split across
the two buffers. With the addition of padding bytes ("ppp") prior to
the first data byte, the data can be forced to align correctly in the
second buffer.
Buffer 1 Buffer 2
Unpadded -------------- --------------
XXXXXXXLLLLDDDDDDDDDDDDDD ---> XXXXXXXLLLLDDD DDDDDDDDDDD
Padded
XXXXXXXLLLLpppDDDDDDDDDDDDDD ---> XXXXXXXLLLLppp DDDDDDDDDDDDDD
Talpey & Callaghan Standards Track [Page 17]
^L
RFC 5666 RDMA Transport for RPC January 2010
Padding is implemented completely within the RDMA transport encoding,
flagged with a specific message type. Where padding is applied, two
values are passed to the peer: an "rdma_align", which is the padding
value used, and "rdma_thresh", which is the opaque data size at or
above which padding is applied. For instance, if the server is using
chained 4 KB receive buffers, then up to (4 KB - 1) padding bytes
could be used to achieve alignment of the data. The XDR routine at
the peer MUST consult these values when decoding opaque values.
Where the decoded length exceeds the rdma_thresh, the XDR decode MUST
skip over the appropriate padding as indicated by rdma_align and the
current XDR stream position.
4. RPC RDMA Message Layout
RPC call and reply messages are conveyed across an RDMA transport
with a prepended RPC-over-RDMA header. The RPC-over-RDMA header
includes data for RDMA flow control credits, padding parameters, and
lists of addresses that provide direct data placement via RDMA Read
and Write operations. The layout of the RPC message itself is
unchanged from that described in [RFC5531] except for the possible
exclusion of large data chunks that will be moved by RDMA Read or
Write operations. If the RPC message (along with the RPC-over-RDMA
header) is too long for the posted receive buffer (even after any
large chunks are removed), then the entire RPC message MAY be moved
separately as a chunk, leaving just the RPC-over-RDMA header in the
RDMA Send.
4.1. RPC-over-RDMA Header
The RPC-over-RDMA header begins with four 32-bit fields that are
always present and that control the RDMA interaction including RDMA-
specific flow control. These are then followed by a number of items
such as chunk lists and padding that MAY or MUST NOT be present
depending on the type of transmission. The four fields that are
always present are:
1. Transaction ID (XID).
The XID generated for the RPC call and reply. Having the XID at
the beginning of the message makes it easy to establish the
message context. This XID MUST be the same as the XID in the RPC
header. The receiver MAY perform its processing based solely on
the XID in the RPC-over-RDMA header, and thereby ignore the XID in
the RPC header, if it so chooses.
2. Version number.
This version of the RPC RDMA message protocol is 1. The version
number MUST be increased by 1 whenever the format of the RPC RDMA
messages is changed.
Talpey & Callaghan Standards Track [Page 18]
^L
RFC 5666 RDMA Transport for RPC January 2010
3. Flow control credit value.
When sent in an RPC call message, the requested value is provided.
When sent in an RPC reply message, the granted value is returned.
RPC calls SHOULD NOT be sent in excess of the currently granted
limit.
4. Message type.
o RDMA_MSG = 0 indicates that chunk lists and RPC message follow.
o RDMA_NOMSG = 1 indicates that after the chunk lists there is no
RPC message. In this case, the chunk lists provide information
to allow the message proper to be transferred using RDMA Read
or Write and thus is not appended to the RPC-over-RDMA header.
o RDMA_MSGP = 2 indicates that a chunk list and RPC message with
some padding follow.
o RDMA_DONE = 3 indicates that the message signals the completion
of a chunk transfer via RDMA Read.
o RDMA_ERROR = 4 is used to signal any detected error(s) in the
RPC RDMA chunk encoding.
Because the version number is encoded as part of this header, and the
RDMA_ERROR message type is used to indicate errors, these first four
fields and the start of the following message body MUST always remain
aligned at these fixed offsets for all versions of the RPC-over-RDMA
header.
For a message of type RDMA_MSG or RDMA_NOMSG, the Read and Write
chunk lists follow. If the Read chunk list is null (a 32-bit word of
zeros), then there are no chunks to be transferred separately and the
RPC message follows in its entirety. If non-null, then it's the
beginning of an XDR encoded sequence of Read chunk list entries. If
the Write chunk list is non-null, then an XDR encoded sequence of
Write chunk entries follows.
If the message type is RDMA_MSGP, then two additional fields that
specify the padding alignment and threshold are inserted prior to the
Read and Write chunk lists.
A header of message type RDMA_MSG or RDMA_MSGP MUST be followed by
the RPC call or RPC reply message body, beginning with the XID. The
XID in the RDMA_MSG or RDMA_MSGP header MUST match this.
Talpey & Callaghan Standards Track [Page 19]
^L
RFC 5666 RDMA Transport for RPC January 2010
+--------+---------+---------+-----------+-------------+----------
| | | | Message | NULLs | RPC Call
| XID | Version | Credits | Type | or | or
| | | | | Chunk Lists | Reply Msg
+--------+---------+---------+-----------+-------------+----------
Note that in the case of RDMA_DONE and RDMA_ERROR, no chunk list or
RPC message follows. As an implementation hint: a gather operation
on the Send of the RDMA RPC message can be used to marshal the
initial header, the chunk list, and the RPC message itself.
4.2. RPC-over-RDMA Header Errors
When a peer receives an RPC RDMA message, it MUST perform the
following basic validity checks on the header and chunk contents. If
such errors are detected in the request, an RDMA_ERROR reply MUST be
generated.
Two types of errors are defined, version mismatch and invalid chunk
format. When the peer detects an RPC-over-RDMA header version that
it does not support (currently this document defines only version 1),
it replies with an error code of ERR_VERS, and provides the low and
high inclusive version numbers it does, in fact, support. The
version number in this reply MUST be any value otherwise valid at the
receiver. When other decoding errors are detected in the header or
chunks, either an RPC decode error MAY be returned or the RPC/RDMA
error code ERR_CHUNK MUST be returned.
4.3. XDR Language Description
Here is the message layout in XDR language.
struct xdr_rdma_segment {
uint32 handle; /* Registered memory handle */
uint32 length; /* Length of the chunk in bytes */
uint64 offset; /* Chunk virtual address or offset */
};
struct xdr_read_chunk {
uint32 position; /* Position in XDR stream */
struct xdr_rdma_segment target;
};
struct xdr_read_list {
struct xdr_read_chunk entry;
struct xdr_read_list *next;
};
Talpey & Callaghan Standards Track [Page 20]
^L
RFC 5666 RDMA Transport for RPC January 2010
struct xdr_write_chunk {
struct xdr_rdma_segment target<>;
};
struct xdr_write_list {
struct xdr_write_chunk entry;
struct xdr_write_list *next;
};
struct rdma_msg {
uint32 rdma_xid; /* Mirrors the RPC header xid */
uint32 rdma_vers; /* Version of this protocol */
uint32 rdma_credit; /* Buffers requested/granted */
rdma_body rdma_body;
};
enum rdma_proc {
RDMA_MSG=0, /* An RPC call or reply msg */
RDMA_NOMSG=1, /* An RPC call or reply msg - separate body */
RDMA_MSGP=2, /* An RPC call or reply msg with padding */
RDMA_DONE=3, /* Client signals reply completion */
RDMA_ERROR=4 /* An RPC RDMA encoding error */
};
union rdma_body switch (rdma_proc proc) {
case RDMA_MSG:
rpc_rdma_header rdma_msg;
case RDMA_NOMSG:
rpc_rdma_header_nomsg rdma_nomsg;
case RDMA_MSGP:
rpc_rdma_header_padded rdma_msgp;
case RDMA_DONE:
void;
case RDMA_ERROR:
rpc_rdma_error rdma_error;
};
struct rpc_rdma_header {
struct xdr_read_list *rdma_reads;
struct xdr_write_list *rdma_writes;
struct xdr_write_chunk *rdma_reply;
/* rpc body follows */
};
struct rpc_rdma_header_nomsg {
struct xdr_read_list *rdma_reads;
struct xdr_write_list *rdma_writes;
struct xdr_write_chunk *rdma_reply;
Talpey & Callaghan Standards Track [Page 21]
^L
RFC 5666 RDMA Transport for RPC January 2010
};
struct rpc_rdma_header_padded {
uint32 rdma_align; /* Padding alignment */
uint32 rdma_thresh; /* Padding threshold */
struct xdr_read_list *rdma_reads;
struct xdr_write_list *rdma_writes;
struct xdr_write_chunk *rdma_reply;
/* rpc body follows */
};
enum rpc_rdma_errcode {
ERR_VERS = 1,
ERR_CHUNK = 2
};
union rpc_rdma_error switch (rpc_rdma_errcode err) {
case ERR_VERS:
uint32 rdma_vers_low;
uint32 rdma_vers_high;
case ERR_CHUNK:
void;
default:
uint32 rdma_extra[8];
};
5. Long Messages
The receiver of RDMA Send messages is required by RDMA to have
previously posted one or more adequately sized buffers. The RPC
client can inform the server of the maximum size of its RDMA Send
messages via the Connection Configuration Protocol described later in
this document.
Since RPC messages are frequently small, memory savings can be
achieved by posting small buffers. Even large messages like NFS READ
or WRITE will be quite small once the chunks are removed from the
message. However, there may be large messages that would demand a
very large buffer be posted, where the contents of the buffer may not
be a chunkable XDR element. A good example is an NFS READDIR reply,
which may contain a large number of small filename strings. Also,
the NFS version 4 protocol [RFC3530] features COMPOUND request and
reply messages of unbounded length.
Ideally, each upper layer will negotiate these limits. However, it
is frequently necessary to provide a transparent solution.
Talpey & Callaghan Standards Track [Page 22]
^L
RFC 5666 RDMA Transport for RPC January 2010
5.1. Message as an RDMA Read Chunk
One relatively simple method is to have the client identify any RPC
message that exceeds the RPC server's posted buffer size and move it
separately as a chunk, i.e., reference it as the first entry in the
read chunk list with an XDR position of zero.
Normal Message
+--------+---------+---------+------------+-------------+----------
| | | | | | RPC Call
| XID | Version | Credits | RDMA_MSG | Chunk Lists | or
| | | | | | Reply Msg
+--------+---------+---------+------------+-------------+----------
Long Message
+--------+---------+---------+------------+-------------+
| | | | | |
| XID | Version | Credits | RDMA_NOMSG | Chunk Lists |
| | | | | |
+--------+---------+---------+------------+-------------+
|
| +----------
| | Long RPC Call
+->| or
| Reply Message
+----------
If the receiver gets an RPC-over-RDMA header with a message type of
RDMA_NOMSG and finds an initial read chunk list entry with a zero XDR
position, it allocates a registered buffer and issues an RDMA Read of
the long RPC message into it. The receiver then proceeds to XDR
decode the RPC message as if it had received it inline with the Send
data. Further decoding may issue additional RDMA Reads to bring over
additional chunks.
Although the handling of long messages requires one extra network
turnaround, in practice these messages will be rare if the posted
receive buffers are correctly sized, and of course they will be
non-existent for RDMA-aware upper layers.
Talpey & Callaghan Standards Track [Page 23]
^L
RFC 5666 RDMA Transport for RPC January 2010
A long call RPC with request supplied via RDMA Read
RPC Client RPC Server
| RDMA-over-RPC Header |
Send | ------------------------------> |
| |
| Long RPC Call Msg |
| +------------------------------ | Read
| v-----------------------------> |
| |
| RDMA-over-RPC Reply |
| <------------------------------ | Send
An RPC with long reply returned via RDMA Read
RPC Client RPC Server
| RPC Call |
Send | ------------------------------> |
| |
| RDMA-over-RPC Header |
| <------------------------------ | Send
| |
| Long RPC Reply Msg |
Read | ------------------------------+ |
| <-----------------------------v |
| |
| Done |
Send | ------------------------------> |
It is possible for a single RPC procedure to employ both a long call
for its arguments and a long reply for its results. However, such an
operation is atypical, as few upper layers define such exchanges.
5.2. RDMA Write of Long Replies (Reply Chunks)
A superior method of handling long RPC replies is to have the RPC
client post a large buffer into which the server can write a large
RPC reply. This has the advantage that an RDMA Write may be slightly
faster in network latency than an RDMA Read, and does not require the
server to wait for the completion as it must for RDMA Read.
Additionally, for a reply it removes the need for an RDMA_DONE
message if the large reply is returned as a Read chunk.
This protocol supports direct return of a large reply via the
inclusion of an OPTIONAL rdma_reply write chunk after the read chunk
list and the write chunk list. The client allocates a buffer sized
to receive a large reply and enters its steering tag, address and
length in the rdma_reply write chunk. If the reply message is too
Talpey & Callaghan Standards Track [Page 24]
^L
RFC 5666 RDMA Transport for RPC January 2010
long to return inline with an RDMA Send (exceeds the size of the
client's posted receive buffer), even with read chunks removed, then
the RPC server performs an RDMA Write of the RPC reply message into
the buffer indicated by the rdma_reply chunk. If the client doesn't
provide an rdma_reply chunk, or if it's too small, then if the upper-
layer specification permits, the message MAY be returned as a Read
chunk.
An RPC with long reply returned via RDMA Write
RPC Client RPC Server
| RPC Call with rdma_reply |
Send | ------------------------------> |
| |
| Long RPC Reply Msg |
| <------------------------------ | Write
| |
| RDMA-over-RPC Header |
| <------------------------------ | Send
The use of RDMA Write to return long replies requires that the client
applications anticipate a long reply and have some knowledge of its
size so that an adequately sized buffer can be allocated. This is
certainly true of NFS READDIR replies; where the client already
provides an upper bound on the size of the encoded directory fragment
to be returned by the server.
The use of these "reply chunks" is highly efficient and convenient
for both RPC client and server. Their use is encouraged for eligible
RPC operations such as NFS READDIR, which would otherwise require
extensive chunk management within the results or use of RDMA Read and
a Done message [RFC5667].
6. Connection Configuration Protocol
RDMA Send operations require the receiver to post one or more buffers
at the RDMA connection endpoint, each large enough to receive the
largest Send message. Buffers are consumed as Send messages are
received. If a buffer is too small, or if there are no buffers
posted, the RDMA transport MAY return an error and break the RDMA
connection. The receiver MUST post sufficient, adequately buffers to
avoid buffer overrun or capacity errors.
The protocol described above includes only a mechanism for managing
the number of such receive buffers and no explicit features to allow
the RPC client and server to provision or control buffer sizing, nor
any other session parameters.
Talpey & Callaghan Standards Track [Page 25]
^L
RFC 5666 RDMA Transport for RPC January 2010
In the past, this type of connection management has not been
necessary for RPC. RPC over UDP or TCP does not have a protocol to
negotiate the link. The server can get a rough idea of the maximum
size of messages from the server protocol code. However, a protocol
to negotiate transport features on a more dynamic basis is desirable.
The Connection Configuration Protocol allows the client to pass its
connection requirements to the server, and allows the server to
inform the client of its connection limits.
Use of the Connection Configuration Protocol by an upper layer is
OPTIONAL.
6.1. Initial Connection State
This protocol MAY be used for connection setup prior to the use of
another RPC protocol that uses the RDMA transport. It operates
in-band, i.e., it uses the connection itself to negotiate the
connection parameters. To provide a basis for connection
negotiation, the connection is assumed to provide a basic level of
interoperability: the ability to exchange at least one RPC message at
a time that is at least 1 KB in size. The server MAY exceed this
basic level of configuration, but the client MUST NOT assume more
than one, and MUST receive a valid reply from the server carrying the
actual number of available receive messages, prior to sending its
next request.
6.2. Protocol Description
Version 1 of the Connection Configuration Protocol consists of a
single procedure that allows the client to inform the server of its
connection requirements and the server to return connection
information to the client.
The maxcall_sendsize argument is the maximum size of an RPC call
message that the client MAY send inline in an RDMA Send message to
the server. The server MAY return a maxcall_sendsize value that is
smaller or larger than the client's request. The client MUST NOT
send an inline call message larger than what the server will accept.
The maxcall_sendsize limits only the size of inline RPC calls. It
does not limit the size of long RPC messages transferred as an
initial chunk in the Read chunk list.
The maxreply_sendsize is the maximum size of an inline RPC message
that the client will accept from the server.
Talpey & Callaghan Standards Track [Page 26]
^L
RFC 5666 RDMA Transport for RPC January 2010
The maxrdmaread is the maximum number of RDMA Reads that may be
active at the peer. This number correlates to the RDMA incoming RDMA
Read count ("IRD") configured into each originating endpoint by the
client or server. If more than this number of RDMA Read operations
by the connected peer are issued simultaneously, connection loss or
suboptimal flow control may result; therefore, the value SHOULD be
observed at all times. The peers' values need not be equal. If
zero, the peer MUST NOT issue requests that require RDMA Read to
satisfy, as no transfer will be possible.
The align value is the value recommended by the server for opaque
data values such as strings and counted byte arrays. The client MAY
use this value to compute the number of prepended pad bytes when XDR
encoding opaque values in the RPC call message.
typedef unsigned int uint32;
struct config_rdma_req {
uint32 maxcall_sendsize;
/* max size of inline RPC call */
uint32 maxreply_sendsize;
/* max size of inline RPC reply */
uint32 maxrdmaread;
/* max active RDMA Reads at client */
};
struct config_rdma_reply {
uint32 maxcall_sendsize;
/* max call size accepted by server */
uint32 align;
/* server's receive buffer alignment */
uint32 maxrdmaread;
/* max active RDMA Reads at server */
};
program CONFIG_RDMA_PROG {
version VERS1 {
/*
* Config call/reply
*/
config_rdma_reply CONF_RDMA(config_rdma_req) = 1;
} = 1;
} = 100417;
Talpey & Callaghan Standards Track [Page 27]
^L
RFC 5666 RDMA Transport for RPC January 2010
7. Memory Registration Overhead
RDMA requires that all data be transferred between registered memory
regions at the source and destination. All protocol headers as well
as separately transferred data chunks use registered memory. Since
the cost of registering and de-registering memory can be a large
proportion of the RDMA transaction cost, it is important to minimize
registration activity. This is easily achieved within RPC controlled
memory by allocating chunk list data and RPC headers in a reusable
way from pre-registered pools.
The data chunks transferred via RDMA MAY occupy memory that persists
outside the bounds of the RPC transaction. Hence, the default
behavior of an RPC-over-RDMA transport is to register and de-register
these chunks on every transaction. However, this is not a limitation
of the protocol -- only of the existing local RPC API. The API is
easily extended through such functions as rpc_control(3) to change
the default behavior so that the application can assume
responsibility for controlling memory registration through an RPC-
provided registered memory allocator.
8. Errors and Error Recovery
RPC RDMA protocol errors are described in Section 4. RPC errors and
RPC error recovery are not affected by the protocol, and proceed as
for any RPC error condition. RDMA transport error reporting and
recovery are outside the scope of this protocol.
It is assumed that the link itself will provide some degree of error
detection and retransmission. iWARP's Marker PDU Aligned (MPA) layer
(when used over TCP), Stream Control Transmission Protocol (SCTP), as
well as the InfiniBand link layer all provide Cyclic Redundancy Check
(CRC) protection of the RDMA payload, and CRC-class protection is a
general attribute of such transports. Additionally, the RPC layer
itself can accept errors from the link level and recover via
retransmission. RPC recovery can handle complete loss and
re-establishment of the link.
See Section 11 for further discussion of the use of RPC-level
integrity schemes to detect errors and related efficiency issues.
9. Node Addressing
In setting up a new RDMA connection, the first action by an RPC
client will be to obtain a transport address for the server. The
mechanism used to obtain this address, and to open an RDMA connection
is dependent on the type of RDMA transport, and is the responsibility
of each RPC protocol binding and its local implementation.
Talpey & Callaghan Standards Track [Page 28]
^L
RFC 5666 RDMA Transport for RPC January 2010
10. RPC Binding
RPC services normally register with a portmap or rpcbind [RFC1833]
service, which associates an RPC program number with a service
address. (In the case of UDP or TCP, the service address for NFS is
normally port 2049.) This policy is no different with RDMA
interconnects, although it may require the allocation of port numbers
appropriate to each upper-layer binding that uses the RPC framing
defined here.
When mapped atop the iWARP [RFC5040, RFC5041] transport, which uses
IP port addressing due to its layering on TCP and/or SCTP, port
mapping is trivial and consists merely of issuing the port in the
connection process. The NFS/RDMA protocol service address has been
assigned port 20049 by IANA, for both iWARP/TCP and iWARP/SCTP.
When mapped atop InfiniBand [IB], which uses a Group Identifier
(GID)-based service endpoint naming scheme, a translation MUST be
employed. One such translation is defined in the InfiniBand Port
Addressing Annex [IBPORT], which is appropriate for translating IP
port addressing to the InfiniBand network. Therefore, in this case,
IP port addressing may be readily employed by the upper layer.
When a mapping standard or convention exists for IP ports on an RDMA
interconnect, there are several possibilities for each upper layer to
consider:
One possibility is to have an upper-layer server register its
mapped IP port with the rpcbind service, under the netid (or
netid's) defined here. An RPC/RDMA-aware client can then resolve
its desired service to a mappable port, and proceed to connect.
This is the most flexible and compatible approach, for those upper
layers that are defined to use the rpcbind service.
A second possibility is to have the server's portmapper register
itself on the RDMA interconnect at a "well known" service address.
(On UDP or TCP, this corresponds to port 111.) A client could
connect to this service address and use the portmap protocol to
obtain a service address in response to a program number, e.g., an
iWARP port number, or an InfiniBand GID.
Alternatively, the client could simply connect to the mapped well-
known port for the service itself, if it is appropriately defined.
By convention, the NFS/RDMA service, when operating atop such an
InfiniBand fabric, will use the same 20049 assignment as for
iWARP.
Talpey & Callaghan Standards Track [Page 29]
^L
RFC 5666 RDMA Transport for RPC January 2010
Historically, different RPC protocols have taken different approaches
to their port assignment; therefore, the specific method is left to
each RPC/RDMA-enabled upper-layer binding, and not addressed here.
In Section 12, "IANA Considerations", this specification defines two
new "netid" values, to be used for registration of upper layers atop
iWARP [RFC5040, RFC5041] and (when a suitable port translation
service is available) InfiniBand [IB]. Additional RDMA-capable
networks MAY define their own netids, or if they provide a port
translation, MAY share the one defined here.
11. Security Considerations
RPC provides its own security via the RPCSEC_GSS framework [RFC2203].
RPCSEC_GSS can provide message authentication, integrity checking,
and privacy. This security mechanism will be unaffected by the RDMA
transport. The data integrity and privacy features alter the body of
the message, presenting it as a single chunk. For large messages the
chunk may be large enough to qualify for RDMA Read transfer.
However, there is much data movement associated with computation and
verification of integrity, or encryption/decryption, so certain
performance advantages may be lost.
For efficiency, a more appropriate security mechanism for RDMA links
may be link-level protection, such as certain configurations of
IPsec, which may be co-located in the RDMA hardware. The use of
link-level protection MAY be negotiated through the use of the new
RPCSEC_GSS mechanism defined in [RFC5403] in conjunction with the
Channel Binding mechanism [RFC5056] and IPsec Channel Connection
Latching [RFC5660]. Use of such mechanisms is REQUIRED where
integrity and/or privacy is desired, and where efficiency is
required.
An additional consideration is the protection of the integrity and
privacy of local memory by the RDMA transport itself. The use of
RDMA by RPC MUST NOT introduce any vulnerabilities to system memory
contents, or to memory owned by user processes. These protections
are provided by the RDMA layer specifications, and specifically their
security models. It is REQUIRED that any RDMA provider used for RPC
transport be conformant to the requirements of [RFC5042] in order to
satisfy these protections.
Once delivered securely by the RDMA provider, any RDMA-exposed
addresses will contain only RPC payloads in the chunk lists,
transferred under the protection of RPCSEC_GSS integrity and privacy.
By these means, the data will be protected end-to-end, as required by
the RPC layer security model.
Talpey & Callaghan Standards Track [Page 30]
^L
RFC 5666 RDMA Transport for RPC January 2010
Where upper-layer protocols choose to supply results to the requester
via read chunks, a server resource deficit can arise if the client
does not promptly acknowledge their status via the RDMA_DONE message.
This can potentially lead to a denial-of-service situation, with a
single client unfairly (and unnecessarily) consuming server RDMA
resources. Servers for such upper-layer protocols MUST protect
against this situation, originating from one or many clients. For
example, a time-based window of buffer availability may be offered,
if the client fails to obtain the data within the window, it will
simply retry using ordinary RPC retry semantics. Or, a more severe
method would be for the server to simply close the client's RDMA
connection, freeing the RDMA resources and allowing the server to
reclaim them.
A fairer and more useful method is provided by the protocol itself.
The server MAY use the rdma_credit value to limit the number of
outstanding requests for each client. By including the number of
outstanding RDMA_DONE completions in the computation of available
client credits, the server can limit its exposure to each client, and
therefore provide uninterrupted service as its resources permit.
However, the server must ensure that it does not decrease the credit
count to zero with this method, since the RDMA_DONE message is not
acknowledged. If the credit count were to drop to zero solely due to
outstanding RDMA_DONE messages, the client would deadlock since it
would never obtain a new credit with which to continue. Therefore,
if the server adjusts credits to zero for outstanding RDMA_DONE, it
MUST withhold its reply to at least one message in order to provide
the next credit. The time-based window (or any other appropriate
method) SHOULD be used by the server to recover resources in the
event that the client never returns.
The Connection Configuration Protocol, when used, MUST be protected
by an appropriate RPC security flavor, to ensure it is not attacked
in the process of initiating an RPC/RDMA connection.
12. IANA Considerations
Three new assignments are specified by this document:
- A new set of RPC "netids" for resolving RPC/RDMA services
- Optional service port assignments for upper-layer bindings
- An RPC program number assignment for the configuration protocol
These assignments have been established, as below.
Talpey & Callaghan Standards Track [Page 31]
^L
RFC 5666 RDMA Transport for RPC January 2010
The new RPC transport has been assigned an RPC "netid", which is an
rpcbind [RFC1833] string used to describe the underlying protocol in
order for RPC to select the appropriate transport framing, as well as
the format of the service addresses and ports.
The following "Netid" registry strings are defined for this purpose:
NC_RDMA "rdma"
NC_RDMA6 "rdma6"
These netids MAY be used for any RDMA network satisfying the
requirements of Section 2, and able to identify service endpoints
using IP port addressing, possibly through use of a translation
service as described above in Section 10, "RPC Binding". The "rdma"
netid is to be used when IPv4 addressing is employed by the
underlying transport, and "rdma6" for IPv6 addressing.
The netid assignment policy and registry are defined in [RFC5665].
As a new RPC transport, this protocol has no effect on RPC program
numbers or existing registered port numbers. However, new port
numbers MAY be registered for use by RPC/RDMA-enabled services, as
appropriate to the new networks over which the services will operate.
For example, the NFS/RDMA service defined in [RFC5667] has been
assigned the port 20049, in the IANA registry:
nfsrdma 20049/tcp Network File System (NFS) over RDMA
nfsrdma 20049/udp Network File System (NFS) over RDMA
nfsrdma 20049/sctp Network File System (NFS) over RDMA
The OPTIONAL Connection Configuration Protocol described herein
requires an RPC program number assignment. The value "100417" has
been assigned:
rdmaconfig 100417 rpc.rdmaconfig
The RPC program number assignment policy and registry are defined in
[RFC5531].
13. Acknowledgments
The authors wish to thank Rob Thurlow, John Howard, Chet Juszczak,
Alex Chiu, Peter Staubach, Dave Noveck, Brian Pawlowski, Steve
Kleiman, Mike Eisler, Mark Wittle, Shantanu Mehendale, David
Robinson, and Mallikarjun Chadalapaka for their contributions to this
document.
Talpey & Callaghan Standards Track [Page 32]
^L
RFC 5666 RDMA Transport for RPC January 2010
14. References
14.1. Normative References
[RFC1833] Srinivasan, R., "Binding Protocols for ONC RPC Version 2",
RFC 1833, August 1995.
[RFC2203] Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol
Specification", RFC 2203, September 1997.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC4506] Eisler, M., Ed., "XDR: External Data Representation
Standard", STD 67, RFC 4506, May 2006.
[RFC5042] Pinkerton, J. and E. Deleganes, "Direct Data Placement
Protocol (DDP) / Remote Direct Memory Access Protocol
(RDMAP) Security", RFC 5042, October 2007.
[RFC5056] Williams, N., "On the Use of Channel Bindings to Secure
Channels", RFC 5056, November 2007.
[RFC5403] Eisler, M., "RPCSEC_GSS Version 2", RFC 5403, February
2009.
[RFC5531] Thurlow, R., "RPC: Remote Procedure Call Protocol
Specification Version 2", RFC 5531, May 2009.
[RFC5660] Williams, N., "IPsec Channels: Connection Latching", RFC
5660, October 2009.
[RFC5665] Eisler, M., "IANA Considerations for Remote Procedure Call
(RPC) Network Identifiers and Universal Address Formats",
RFC 5665, January 2010.
14.2. Informative References
[RFC1094] Sun Microsystems, "NFS: Network File System Protocol
specification", RFC 1094, March 1989.
[RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS
Version 3 Protocol Specification", RFC 1813, June 1995.
[RFC3530] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R.,
Beame, C., Eisler, M., and D. Noveck, "Network File System
(NFS) version 4 Protocol", RFC 3530, April 2003.
Talpey & Callaghan Standards Track [Page 33]
^L
RFC 5666 RDMA Transport for RPC January 2010
[RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D.
Garcia, "A Remote Direct Memory Access Protocol
Specification", RFC 5040, October 2007.
[RFC5041] Shah, H., Pinkerton, J., Recio, R., and P. Culley, "Direct
Data Placement over Reliable Transports", RFC 5041,
October 2007.
[RFC5532] Talpey, T. and C. Juszczak, "Network File System (NFS)
Remote Direct Memory Access (RDMA) Problem Statement", RFC
5532, May 2009.
[RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed.,
"Network File System Version 4 Minor Version 1 Protocol",
RFC 5661, January 2010.
[RFC5667] Talpey, T. and B. Callaghan, "Network File System (NFS)
Direct Data Placement", RFC 5667, January 2010.
[IB] InfiniBand Trade Association, InfiniBand Architecture
Specifications, available from
http://www.infinibandta.org.
[IBPORT] InfiniBand Trade Association, "IP Addressing Annex",
available from http://www.infinibandta.org.
Authors' Addresses
Tom Talpey
170 Whitman St.
Stow, MA 01775 USA
EMail: tmtalpey@gmail.com
Brent Callaghan
Apple Computer, Inc.
MS: 302-4K
2 Infinite Loop
Cupertino, CA 95014 USA
EMail: brentc@apple.com
Talpey & Callaghan Standards Track [Page 34]
^L
|