diff options
author | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
---|---|---|
committer | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
commit | 4bfd864f10b68b71482b35c818559068ef8d5797 (patch) | |
tree | e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc8435.txt | |
parent | ea76e11061bda059ae9f9ad130a9895cc85607db (diff) |
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc8435.txt')
-rw-r--r-- | doc/rfc/rfc8435.txt | 2355 |
1 files changed, 2355 insertions, 0 deletions
diff --git a/doc/rfc/rfc8435.txt b/doc/rfc/rfc8435.txt new file mode 100644 index 0000000..02b0b8d --- /dev/null +++ b/doc/rfc/rfc8435.txt @@ -0,0 +1,2355 @@ + + + + + + +Internet Engineering Task Force (IETF) B. Halevy +Request for Comments: 8435 +Category: Standards Track T. Haynes +ISSN: 2070-1721 Hammerspace + August 2018 + + + Parallel NFS (pNFS) Flexible File Layout + +Abstract + + Parallel NFS (pNFS) allows a separation between the metadata (onto a + metadata server) and data (onto a storage device) for a file. The + flexible file layout type is defined in this document as an extension + to pNFS that allows the use of storage devices that require only a + limited degree of interaction with the metadata server and use + already-existing protocols. Client-side mirroring is also added to + provide replication of files. + +Status of This Memo + + This is an Internet Standards Track document. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Further information on + Internet Standards is available in Section 2 of RFC 7841. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + https://www.rfc-editor.org/info/rfc8435. + +Copyright Notice + + Copyright (c) 2018 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Simplified BSD License text as described in Section 4.e of + the Trust Legal Provisions and are provided without warranty as + described in the Simplified BSD License. + + + + +Halevy & Haynes Standards Track [Page 1] + +RFC 8435 pNFS Flexible File Layout August 2018 + + +Table of Contents + + 1. Introduction ....................................................3 + 1.1. Definitions ................................................4 + 1.2. Requirements Language ......................................6 + 2. Coupling of Storage Devices .....................................6 + 2.1. LAYOUTCOMMIT ...............................................7 + 2.2. Fencing Clients from the Storage Device ....................7 + 2.2.1. Implementation Notes for Synthetic uids/gids ........8 + 2.2.2. Example of Using Synthetic uids/gids ................9 + 2.3. State and Locking Models ..................................10 + 2.3.1. Loosely Coupled Locking Model ......................11 + 2.3.2. Tightly Coupled Locking Model ......................12 + 3. XDR Description of the Flexible File Layout Type ...............13 + 3.1. Code Components Licensing Notice ..........................14 + 4. Device Addressing and Discovery ................................16 + 4.1. ff_device_addr4 ...........................................16 + 4.2. Storage Device Multipathing ...............................17 + 5. Flexible File Layout Type ......................................18 + 5.1. ff_layout4 ................................................19 + 5.1.1. Error Codes from LAYOUTGET .........................23 + 5.1.2. Client Interactions with FF_FLAGS_NO_IO_THRU_MDS ...23 + 5.2. LAYOUTCOMMIT ..............................................24 + 5.3. Interactions between Devices and Layouts ..................24 + 5.4. Handling Version Errors ...................................24 + 6. Striping via Sparse Mapping ....................................25 + 7. Recovering from Client I/O Errors ..............................25 + 8. Mirroring ......................................................26 + 8.1. Selecting a Mirror ........................................26 + 8.2. Writing to Mirrors ........................................27 + 8.2.1. Single Storage Device Updates Mirrors ..............27 + 8.2.2. Client Updates All Mirrors .........................27 + 8.2.3. Handling Write Errors ..............................28 + 8.2.4. Handling Write COMMITs .............................28 + 8.3. Metadata Server Resilvering of the File ...................29 + 9. Flexible File Layout Type Return ...............................29 + 9.1. I/O Error Reporting .......................................30 + 9.1.1. ff_ioerr4 ..........................................30 + 9.2. Layout Usage Statistics ...................................31 + 9.2.1. ff_io_latency4 .....................................31 + 9.2.2. ff_layoutupdate4 ...................................32 + 9.2.3. ff_iostats4 ........................................33 + 9.3. ff_layoutreturn4 ..........................................34 + 10. Flexible File Layout Type LAYOUTERROR .........................35 + 11. Flexible File Layout Type LAYOUTSTATS .........................35 + 12. Flexible File Layout Type Creation Hint .......................35 + 12.1. ff_layouthint4 ...........................................35 + 13. Recalling a Layout ............................................36 + + + +Halevy & Haynes Standards Track [Page 2] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + 13.1. CB_RECALL_ANY ............................................36 + 14. Client Fencing ................................................37 + 15. Security Considerations .......................................37 + 15.1. RPCSEC_GSS and Security Services .........................39 + 15.1.1. Loosely Coupled ...................................39 + 15.1.2. Tightly Coupled ...................................39 + 16. IANA Considerations ...........................................39 + 17. References ....................................................40 + 17.1. Normative References .....................................40 + 17.2. Informative References ...................................41 + Acknowledgments ...................................................42 + Authors' Addresses ................................................42 + +1. Introduction + + In Parallel NFS (pNFS), the metadata server returns layout type + structures that describe where file data is located. There are + different layout types for different storage systems and methods of + arranging data on storage devices. This document defines the + flexible file layout type used with file-based data servers that are + accessed using the NFS protocols: NFSv3 [RFC1813], NFSv4.0 [RFC7530], + NFSv4.1 [RFC5661], and NFSv4.2 [RFC7862]. + + To provide a global state model equivalent to that of the files + layout type, a back-end control protocol might be implemented between + the metadata server and NFSv4.1+ storage devices. An implementation + can either define its own proprietary mechanism or it could define a + control protocol in a Standards Track document. The requirements for + a control protocol are specified in [RFC5661] and clarified in + [RFC8434]. + + The control protocol described in this document is based on NFS. It + does not provide for knowledge of stateids to be passed between the + metadata server and the storage devices. Instead, the storage + devices are configured such that the metadata server has full access + rights to the data file system and then the metadata server uses + synthetic ids to control client access to individual files. + + In traditional mirroring of data, the server is responsible for + replicating, validating, and repairing copies of the data file. With + client-side mirroring, the metadata server provides a layout that + presents the available mirrors to the client. The client then picks + a mirror to read from and ensures that all writes go to all mirrors. + The client only considers the write transaction to have succeeded if + all mirrors are successfully updated. In case of error, the client + can use the LAYOUTERROR operation to inform the metadata server, + which is then responsible for the repairing of the mirrored copies of + the file. + + + +Halevy & Haynes Standards Track [Page 3] + +RFC 8435 pNFS Flexible File Layout August 2018 + + +1.1. Definitions + + control communication requirements: the specification for + information on layouts, stateids, file metadata, and file data + that must be communicated between the metadata server and the + storage devices. There is a separate set of requirements for each + layout type. + + control protocol: the particular mechanism that an implementation of + a layout type would use to meet the control communication + requirement for that layout type. This need not be a protocol as + normally understood. In some cases, the same protocol may be used + as a control protocol and storage protocol. + + client-side mirroring: a feature in which the client, not the + server, is responsible for updating all of the mirrored copies of + a layout segment. + + (file) data: that part of the file system object that contains the + data to be read or written. It is the contents of the object + rather than the attributes of the object. + + data server (DS): a pNFS server that provides the file's data when + the file system object is accessed over a file-based protocol. + + fencing: the process by which the metadata server prevents the + storage devices from processing I/O from a specific client to a + specific file. + + file layout type: a layout type in which the storage devices are + accessed via the NFS protocol (see Section 13 of [RFC5661]). + + gid: the group id, a numeric value that identifies to which group a + file belongs. + + layout: the information a client uses to access file data on a + storage device. This information includes specification of the + protocol (layout type) and the identity of the storage devices to + be used. + + layout iomode: a grant of either read-only or read/write I/O to the + client. + + layout segment: a sub-division of a layout. That sub-division might + be by the layout iomode (see Sections 3.3.20 and 12.2.9 of + [RFC5661]), a striping pattern (see Section 13.3 of [RFC5661]), or + requested byte range. + + + + +Halevy & Haynes Standards Track [Page 4] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + layout stateid: a 128-bit quantity returned by a server that + uniquely defines the layout state provided by the server for a + specific layout that describes a layout type and file (see + Section 12.5.2 of [RFC5661]). Further, Section 12.5.3 of + [RFC5661] describes differences in handling between layout + stateids and other stateid types. + + layout type: a specification of both the storage protocol used to + access the data and the aggregation scheme used to lay out the + file data on the underlying storage devices. + + loose coupling: when the control protocol is a storage protocol. + + (file) metadata: the part of the file system object that contains + various descriptive data relevant to the file object, as opposed + to the file data itself. This could include the time of last + modification, access time, EOF position, etc. + + metadata server (MDS): the pNFS server that provides metadata + information for a file system object. It is also responsible for + generating, recalling, and revoking layouts for file system + objects, for performing directory operations, and for performing + I/O operations to regular files when the clients direct these to + the metadata server itself. + + mirror: a copy of a layout segment. Note that if one copy of the + mirror is updated, then all copies must be updated. + + recalling a layout: a graceful recall, via a callback, of a specific + layout by the metadata server to the client. Graceful here means + that the client would have the opportunity to flush any WRITEs, + etc., before returning the layout to the metadata server. + + revoking a layout: an invalidation of a specific layout by the + metadata server. Once revocation occurs, the metadata server will + not accept as valid any reference to the revoked layout, and a + storage device will not accept any client access based on the + layout. + + resilvering: the act of rebuilding a mirrored copy of a layout + segment from a known good copy of the layout segment. Note that + this can also be done to create a new mirrored copy of the layout + segment. + + rsize: the data transfer buffer size used for READs. + + + + + + +Halevy & Haynes Standards Track [Page 5] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + stateid: a 128-bit quantity returned by a server that uniquely + defines the set of locking-related state provided by the server. + Stateids may designate state related to open files, byte-range + locks, delegations, or layouts. + + storage device: the target to which clients may direct I/O requests + when they hold an appropriate layout. See Section 2.1 of + [RFC8434] for further discussion of the difference between a data + server and a storage device. + + storage protocol: the protocol used by clients to do I/O operations + to the storage device. Each layout type specifies the set of + storage protocols. + + tight coupling: an arrangement in which the control protocol is one + designed specifically for control communication. It may be either + a proprietary protocol adapted specifically to a particular + metadata server or a protocol based on a Standards Track document. + + uid: the user id, a numeric value that identifies which user owns a + file. + + wsize: the data transfer buffer size used for WRITEs. + +1.2. Requirements Language + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and + "OPTIONAL" in this document are to be interpreted as described in + BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all + capitals, as shown here. + +2. Coupling of Storage Devices + + A server implementation may choose either a loosely coupled model or + a tightly coupled model between the metadata server and the storage + devices. [RFC8434] describes the general problems facing pNFS + implementations. This document details how the new flexible file + layout type addresses these issues. To implement the tightly coupled + model, a control protocol has to be defined. As the flexible file + layout imposes no special requirements on the client, the control + protocol will need to provide: + + (1) management of both security and LAYOUTCOMMITs and + + (2) a global stateid model and management of these stateids. + + + + + +Halevy & Haynes Standards Track [Page 6] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + When implementing the loosely coupled model, the only control + protocol will be a version of NFS, with no ability to provide a + global stateid model or to prevent clients from using layouts + inappropriately. To enable client use in that environment, this + document will specify how security, state, and locking are to be + managed. + +2.1. LAYOUTCOMMIT + + Regardless of the coupling model, the metadata server has the + responsibility, upon receiving a LAYOUTCOMMIT (see Section 18.42 of + [RFC5661]) to ensure that the semantics of pNFS are respected (see + Section 3.1 of [RFC8434]). These do include a requirement that data + written to a data storage device be stable before the occurrence of + the LAYOUTCOMMIT. + + It is the responsibility of the client to make sure the data file is + stable before the metadata server begins to query the storage devices + about the changes to the file. If any WRITE to a storage device did + not result with stable_how equal to FILE_SYNC, a LAYOUTCOMMIT to the + metadata server MUST be preceded by a COMMIT to the storage devices + written to. Note that if the client has not done a COMMIT to the + storage device, then the LAYOUTCOMMIT might not be synchronized to + the last WRITE operation to the storage device. + +2.2. Fencing Clients from the Storage Device + + With loosely coupled storage devices, the metadata server uses + synthetic uids (user ids) and gids (group ids) for the data file, + where the uid owner of the data file is allowed read/write access and + the gid owner is allowed read-only access. As part of the layout + (see ffds_user and ffds_group in Section 5.1), the client is provided + with the user and group to be used in the Remote Procedure Call (RPC) + [RFC5531] credentials needed to access the data file. Fencing off of + clients is achieved by the metadata server changing the synthetic uid + and/or gid owners of the data file on the storage device to + implicitly revoke the outstanding RPC credentials. A client + presenting the wrong credential for the desired access will get an + NFS4ERR_ACCESS error. + + With this loosely coupled model, the metadata server is not able to + fence off a single client; it is forced to fence off all clients. + However, as the other clients react to the fencing, returning their + layouts and trying to get new ones, the metadata server can hand out + a new uid and gid to allow access. + + + + + + +Halevy & Haynes Standards Track [Page 7] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + It is RECOMMENDED to implement common access control methods at the + storage device file system to allow only the metadata server root + (super user) access to the storage device and to set the owner of all + directories holding data files to the root user. This approach + provides a practical model to enforce access control and fence off + cooperative clients, but it cannot protect against malicious clients; + hence, it provides a level of security equivalent to AUTH_SYS. It is + RECOMMENDED that the communication between the metadata server and + storage device be secure from eavesdroppers and man-in-the-middle + protocol tampering. The security measure could be physical security + (e.g., the servers are co-located in a physically secure area), + encrypted communications, or some other technique. + + With tightly coupled storage devices, the metadata server sets the + user and group owners, mode bits, and Access Control List (ACL) of + the data file to be the same as the metadata file. And the client + must authenticate with the storage device and go through the same + authorization process it would go through via the metadata server. + In the case of tight coupling, fencing is the responsibility of the + control protocol and is not described in detail in this document. + However, implementations of the tightly coupled locking model (see + Section 2.3) will need a way to prevent access by certain clients to + specific files by invalidating the corresponding stateids on the + storage device. In such a scenario, the client will be given an + error of NFS4ERR_BAD_STATEID. + + The client need not know the model used between the metadata server + and the storage device. It need only react consistently to any + errors in interacting with the storage device. It should both return + the layout and error to the metadata server and ask for a new layout. + At that point, the metadata server can either hand out a new layout, + hand out no layout (forcing the I/O through it), or deny the client + further access to the file. + +2.2.1. Implementation Notes for Synthetic uids/gids + + The selection method for the synthetic uids and gids to be used for + fencing in loosely coupled storage devices is strictly an + implementation issue. That is, an administrator might restrict a + range of such ids available to the Lightweight Directory Access + Protocol (LDAP) 'uid' field [RFC4519]. The administrator might also + be able to choose an id that would never be used to grant access. + Then, when the metadata server had a request to access a file, a + SETATTR would be sent to the storage device to set the owner and + group of the data file. The user and group might be selected in a + round-robin fashion from the range of available ids. + + + + + +Halevy & Haynes Standards Track [Page 8] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + Those ids would be sent back as ffds_user and ffds_group to the + client, who would present them as the RPC credentials to the storage + device. When the client is done accessing the file and the metadata + server knows that no other client is accessing the file, it can reset + the owner and group to restrict access to the data file. + + When the metadata server wants to fence off a client, it changes the + synthetic uid and/or gid to the restricted ids. Note that using a + restricted id ensures that there is a change of owner and at least + one id available that never gets allowed access. + + Under an AUTH_SYS security model, synthetic uids and gids of 0 SHOULD + be avoided. These typically either grant super access to files on a + storage device or are mapped to an anonymous id. In the first case, + even if the data file is fenced, the client might still be able to + access the file. In the second case, multiple ids might be mapped to + the anonymous ids. + +2.2.2. Example of Using Synthetic uids/gids + + The user loghyr creates a file "ompha.c" on the metadata server, + which then creates a corresponding data file on the storage device. + + The metadata server entry may look like: + + -rw-r--r-- 1 loghyr staff 1697 Dec 4 11:31 ompha.c + + On the storage device, the file may be assigned some unpredictable + synthetic uid/gid to deny access: + + -rw-r----- 1 19452 28418 1697 Dec 4 11:31 data_ompha.c + + When the file is opened on a client and accessed, the user will try + to get a layout for the data file. Since the layout knows nothing + about the user (and does not care), it does not matter whether the + user loghyr or garbo opens the file. The client has to present an + uid of 19452 to get write permission. If it presents any other value + for the uid, then it must give a gid of 28418 to get read access. + + Further, if the metadata server decides to fence the file, it should + change the uid and/or gid such that these values neither match + earlier values for that file nor match a predictable change based on + an earlier fencing. + + -rw-r----- 1 19453 28419 1697 Dec 4 11:31 data_ompha.c + + + + + + +Halevy & Haynes Standards Track [Page 9] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + The set of synthetic gids on the storage device should be selected + such that there is no mapping in any of the name services used by the + storage device, i.e., each group should have no members. + + If the layout segment has an iomode of LAYOUTIOMODE4_READ, then the + metadata server should return a synthetic uid that is not set on the + storage device. Only the synthetic gid would be valid. + + The client is thus solely responsible for enforcing file permissions + in a loosely coupled model. To allow loghyr write access, it will + send an RPC to the storage device with a credential of 1066:1067. To + allow garbo read access, it will send an RPC to the storage device + with a credential of 1067:1067. The value of the uid does not matter + as long as it is not the synthetic uid granted when getting the + layout. + + While pushing the enforcement of permission checking onto the client + may seem to weaken security, the client may already be responsible + for enforcing permissions before modifications are sent to a server. + With cached writes, the client is always responsible for tracking who + is modifying a file and making sure to not coalesce requests from + multiple users into one request. + +2.3. State and Locking Models + + An implementation can always be deployed as a loosely coupled model. + There is, however, no way for a storage device to indicate over an + NFS protocol that it can definitively participate in a tightly + coupled model: + + o Storage devices implementing the NFSv3 and NFSv4.0 protocols are + always treated as loosely coupled. + + o NFSv4.1+ storage devices that do not return the + EXCHGID4_FLAG_USE_PNFS_DS flag set to EXCHANGE_ID are indicating + that they are to be treated as loosely coupled. From the locking + viewpoint, they are treated in the same way as NFSv4.0 storage + devices. + + o NFSv4.1+ storage devices that do identify themselves with the + EXCHGID4_FLAG_USE_PNFS_DS flag set to EXCHANGE_ID can potentially + be tightly coupled. They would use a back-end control protocol to + implement the global stateid model as described in [RFC5661]. + + A storage device would have to be either discovered or advertised + over the control protocol to enable a tightly coupled model. + + + + + +Halevy & Haynes Standards Track [Page 10] + +RFC 8435 pNFS Flexible File Layout August 2018 + + +2.3.1. Loosely Coupled Locking Model + + When locking-related operations are requested, they are primarily + dealt with by the metadata server, which generates the appropriate + stateids. When an NFSv4 version is used as the data access protocol, + the metadata server may make stateid-related requests of the storage + devices. However, it is not required to do so, and the resulting + stateids are known only to the metadata server and the storage + device. + + Given this basic structure, locking-related operations are handled as + follows: + + o OPENs are dealt with by the metadata server. Stateids are + selected by the metadata server and associated with the client ID + describing the client's connection to the metadata server. The + metadata server may need to interact with the storage device to + locate the file to be opened, but no locking-related functionality + need be used on the storage device. + + OPEN_DOWNGRADE and CLOSE only require local execution on the + metadata server. + + o Advisory byte-range locks can be implemented locally on the + metadata server. As in the case of OPENs, the stateids associated + with byte-range locks are assigned by the metadata server and only + used on the metadata server. + + o Delegations are assigned by the metadata server that initiates + recalls when conflicting OPENs are processed. No storage device + involvement is required. + + o TEST_STATEID and FREE_STATEID are processed locally on the + metadata server, without storage device involvement. + + All I/O operations to the storage device are done using the anonymous + stateid. Thus, the storage device has no information about the + openowner and lockowner responsible for issuing a particular I/O + operation. As a result: + + o Mandatory byte-range locking cannot be supported because the + storage device has no way of distinguishing I/O done on behalf of + the lock owner from those done by others. + + o Enforcement of share reservations is the responsibility of the + client. Even though I/O is done using the anonymous stateid, the + client must ensure that it has a valid stateid associated with the + openowner. + + + +Halevy & Haynes Standards Track [Page 11] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + In the event that a stateid is revoked, the metadata server is + responsible for preventing client access, since it has no way of + being sure that the client is aware that the stateid in question has + been revoked. + + As the client never receives a stateid generated by a storage device, + there is no client lease on the storage device and no prospect of + lease expiration, even when access is via NFSv4 protocols. Clients + will have leases on the metadata server. In dealing with lease + expiration, the metadata server may need to use fencing to prevent + revoked stateids from being relied upon by a client unaware of the + fact that they have been revoked. + +2.3.2. Tightly Coupled Locking Model + + When locking-related operations are requested, they are primarily + dealt with by the metadata server, which generates the appropriate + stateids. These stateids must be made known to the storage device + using control protocol facilities, the details of which are not + discussed in this document. + + Given this basic structure, locking-related operations are handled as + follows: + + o OPENs are dealt with primarily on the metadata server. Stateids + are selected by the metadata server and associated with the client + ID describing the client's connection to the metadata server. The + metadata server needs to interact with the storage device to + locate the file to be opened and to make the storage device aware + of the association between the metadata-server-chosen stateid and + the client and openowner that it represents. + + OPEN_DOWNGRADE and CLOSE are executed initially on the metadata + server, but the state change made must be propagated to the + storage device. + + o Advisory byte-range locks can be implemented locally on the + metadata server. As in the case of OPENs, the stateids associated + with byte-range locks are assigned by the metadata server and are + available for use on the metadata server. Because I/O operations + are allowed to present lock stateids, the metadata server needs + the ability to make the storage device aware of the association + between the metadata-server-chosen stateid and the corresponding + open stateid it is associated with. + + o Mandatory byte-range locks can be supported when both the metadata + server and the storage devices have the appropriate support. As + in the case of advisory byte-range locks, these are assigned by + + + +Halevy & Haynes Standards Track [Page 12] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + the metadata server and are available for use on the metadata + server. To enable mandatory lock enforcement on the storage + device, the metadata server needs the ability to make the storage + device aware of the association between the metadata-server-chosen + stateid and the client, openowner, and lock (i.e., lockowner, + byte-range, and lock-type) that it represents. Because I/O + operations are allowed to present lock stateids, this information + needs to be propagated to all storage devices to which I/O might + be directed rather than only to storage device that contain the + locked region. + + o Delegations are assigned by the metadata server that initiates + recalls when conflicting OPENs are processed. Because I/O + operations are allowed to present delegation stateids, the + metadata server requires the ability (1) to make the storage + device aware of the association between the metadata-server-chosen + stateid and the filehandle and delegation type it represents and + (2) to break such an association. + + o TEST_STATEID is processed locally on the metadata server, without + storage device involvement. + + o FREE_STATEID is processed on the metadata server, but the metadata + server requires the ability to propagate the request to the + corresponding storage devices. + + Because the client will possess and use stateids valid on the storage + device, there will be a client lease on the storage device, and the + possibility of lease expiration does exist. The best approach for + the storage device is to retain these locks as a courtesy. However, + if it does not do so, control protocol facilities need to provide the + means to synchronize lock state between the metadata server and + storage device. + + Clients will also have leases on the metadata server that are subject + to expiration. In dealing with lease expiration, the metadata server + would be expected to use control protocol facilities enabling it to + invalidate revoked stateids on the storage device. In the event the + client is not responsive, the metadata server may need to use fencing + to prevent revoked stateids from being acted upon by the storage + device. + +3. XDR Description of the Flexible File Layout Type + + This document contains the External Data Representation (XDR) + [RFC4506] description of the flexible file layout type. The XDR + description is embedded in this document in a way that makes it + simple for the reader to extract into a ready-to-compile form. The + + + +Halevy & Haynes Standards Track [Page 13] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + reader can feed this document into the following shell script to + produce the machine-readable XDR description of the flexible file + layout type: + + <CODE BEGINS> + + #!/bin/sh + grep '^ *///' $* | sed 's?^ */// ??' | sed 's?^ *///$??' + + <CODE ENDS> + + That is, if the above script is stored in a file called "extract.sh" + and this document is in a file called "spec.txt", then the reader can + do: + + sh extract.sh < spec.txt > flex_files_prot.x + + The effect of the script is to remove leading white space from each + line, plus a sentinel sequence of "///". + + The embedded XDR file header follows. Subsequent XDR descriptions + with the sentinel sequence are embedded throughout the document. + + Note that the XDR code contained in this document depends on types + from the NFSv4.1 nfs4_prot.x file [RFC5662]. This includes both nfs + types that end with a 4, such as offset4, length4, etc., as well as + more generic types such as uint32_t and uint64_t. + +3.1. Code Components Licensing Notice + + Both the XDR description and the scripts used for extracting the XDR + description are Code Components as described in Section 4 of "Trust + Legal Provisions (TLP)" [LEGAL]. These Code Components are licensed + according to the terms of that document. + + <CODE BEGINS> + + /// /* + /// * Copyright (c) 2018 IETF Trust and the persons identified + /// * as authors of the code. All rights reserved. + /// * + /// * Redistribution and use in source and binary forms, with + /// * or without modification, are permitted provided that the + /// * following conditions are met: + /// * + /// * - Redistributions of source code must retain the above + /// * copyright notice, this list of conditions and the + /// * following disclaimer. + + + +Halevy & Haynes Standards Track [Page 14] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + /// * + /// * - Redistributions in binary form must reproduce the above + /// * copyright notice, this list of conditions and the + /// * following disclaimer in the documentation and/or other + /// * materials provided with the distribution. + /// * + /// * - Neither the name of Internet Society, IETF or IETF + /// * Trust, nor the names of specific contributors, may be + /// * used to endorse or promote products derived from this + /// * software without specific prior written permission. + /// * + /// * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS + /// * AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED + /// * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + /// * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS + /// * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO + /// * EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + /// * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, + /// * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + /// * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR + /// * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + /// * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + /// * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + /// * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING + /// * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF + /// * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + /// * + /// * This code was derived from RFC 8435. + /// * Please reproduce this note if possible. + /// */ + /// + /// /* + /// * flex_files_prot.x + /// */ + /// + /// /* + /// * The following include statements are for example only. + /// * The actual XDR definition files are generated separately + /// * and independently and are likely to have a different name. + /// * %#include <nfsv42.x> + /// * %#include <rpc_prot.x> + /// */ + /// + + <CODE ENDS> + + + + + + +Halevy & Haynes Standards Track [Page 15] + +RFC 8435 pNFS Flexible File Layout August 2018 + + +4. Device Addressing and Discovery + + Data operations to a storage device require the client to know the + network address of the storage device. The NFSv4.1+ GETDEVICEINFO + operation (Section 18.40 of [RFC5661]) is used by the client to + retrieve that information. + +4.1. ff_device_addr4 + + The ff_device_addr4 data structure is returned by the server as the + layout-type-specific opaque field da_addr_body in the device_addr4 + structure by a successful GETDEVICEINFO operation. + + <CODE BEGINS> + + /// struct ff_device_versions4 { + /// uint32_t ffdv_version; + /// uint32_t ffdv_minorversion; + /// uint32_t ffdv_rsize; + /// uint32_t ffdv_wsize; + /// bool ffdv_tightly_coupled; + /// }; + /// + + /// struct ff_device_addr4 { + /// multipath_list4 ffda_netaddrs; + /// ff_device_versions4 ffda_versions<>; + /// }; + /// + + <CODE ENDS> + + The ffda_netaddrs field is used to locate the storage device. It + MUST be set by the server to a list holding one or more of the device + network addresses. + + The ffda_versions array allows the metadata server to present choices + as to NFS version, minor version, and coupling strength to the + client. The ffdv_version and ffdv_minorversion represent the NFS + protocol to be used to access the storage device. This layout + specification defines the semantics for ffdv_versions 3 and 4. If + ffdv_version equals 3, then the server MUST set ffdv_minorversion to + 0 and ffdv_tightly_coupled to false. The client MUST then access the + storage device using the NFSv3 protocol [RFC1813]. If ffdv_version + equals 4, then the server MUST set ffdv_minorversion to one of the + NFSv4 minor version numbers, and the client MUST access the storage + device using NFSv4 with the specified minor version. + + + + +Halevy & Haynes Standards Track [Page 16] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + Note that while the client might determine that it cannot use any of + the configured combinations of ffdv_version, ffdv_minorversion, and + ffdv_tightly_coupled, when it gets the device list from the metadata + server, there is no way to indicate to the metadata server as to + which device it is version incompatible. However, if the client + waits until it retrieves the layout from the metadata server, it can + at that time clearly identify the storage device in question (see + Section 5.4). + + The ffdv_rsize and ffdv_wsize are used to communicate the maximum + rsize and wsize supported by the storage device. As the storage + device can have a different rsize or wsize than the metadata server, + the ffdv_rsize and ffdv_wsize allow the metadata server to + communicate that information on behalf of the storage device. + + ffdv_tightly_coupled informs the client as to whether or not the + metadata server is tightly coupled with the storage devices. Note + that even if the data protocol is at least NFSv4.1, it may still be + the case that there is loose coupling in effect. If + ffdv_tightly_coupled is not set, then the client MUST commit writes + to the storage devices for the file before sending a LAYOUTCOMMIT to + the metadata server. That is, the writes MUST be committed by the + client to stable storage via issuing WRITEs with stable_how == + FILE_SYNC or by issuing a COMMIT after WRITEs with stable_how != + FILE_SYNC (see Section 3.3.7 of [RFC1813]). + +4.2. Storage Device Multipathing + + The flexible file layout type supports multipathing to multiple + storage device addresses. Storage-device-level multipathing is used + for bandwidth scaling via trunking and for higher availability of use + in the event of a storage device failure. Multipathing allows the + client to switch to another storage device address that may be that + of another storage device that is exporting the same data stripe + unit, without having to contact the metadata server for a new layout. + + To support storage device multipathing, ffda_netaddrs contains an + array of one or more storage device network addresses. This array + (data type multipath_list4) represents a list of storage devices + (each identified by a network address), with the possibility that + some storage device will appear in the list multiple times. + + The client is free to use any of the network addresses as a + destination to send storage device requests. If some network + addresses are less desirable paths to the data than others, then the + metadata server SHOULD NOT include those network addresses in + ffda_netaddrs. If less desirable network addresses exist to provide + failover, the RECOMMENDED method to offer the addresses is to provide + + + +Halevy & Haynes Standards Track [Page 17] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + them in a replacement device-ID-to-device-address mapping or a + replacement device ID. When a client finds no response from the + storage device using all addresses available in ffda_netaddrs, it + SHOULD send a GETDEVICEINFO to attempt to replace the existing + device-ID-to-device-address mappings. If the metadata server detects + that all network paths represented by ffda_netaddrs are unavailable, + the metadata server SHOULD send a CB_NOTIFY_DEVICEID (if the client + has indicated it wants device ID notifications for changed device + IDs) to change the device-ID-to-device-address mappings to the + available addresses. If the device ID itself will be replaced, the + metadata server SHOULD recall all layouts with the device ID and thus + force the client to get new layouts and device ID mappings via + LAYOUTGET and GETDEVICEINFO. + + Generally, if two network addresses appear in ffda_netaddrs, they + will designate the same storage device. When the storage device is + accessed over NFSv4.1 or a higher minor version, the two storage + device addresses will support the implementation of client ID or + session trunking (the latter is RECOMMENDED) as defined in [RFC5661]. + The two storage device addresses will share the same server owner or + major ID of the server owner. It is not always necessary for the two + storage device addresses to designate the same storage device with + trunking being used. For example, the data could be read-only, and + the data consist of exact replicas. + +5. Flexible File Layout Type + + The original layouttype4 introduced in [RFC5662] is modified to be: + + <CODE BEGINS> + + enum layouttype4 { + LAYOUT4_NFSV4_1_FILES = 1, + LAYOUT4_OSD2_OBJECTS = 2, + LAYOUT4_BLOCK_VOLUME = 3, + LAYOUT4_FLEX_FILES = 4 + }; + + struct layout_content4 { + layouttype4 loc_type; + opaque loc_body<>; + }; + + + + + + + + + +Halevy & Haynes Standards Track [Page 18] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + struct layout4 { + offset4 lo_offset; + length4 lo_length; + layoutiomode4 lo_iomode; + layout_content4 lo_content; + }; + + <CODE ENDS> + + This document defines structures associated with the layouttype4 + value LAYOUT4_FLEX_FILES. [RFC5661] specifies the loc_body structure + as an XDR type "opaque". The opaque layout is uninterpreted by the + generic pNFS client layers but is interpreted by the flexible file + layout type implementation. This section defines the structure of + this otherwise opaque value, ff_layout4. + +5.1. ff_layout4 + + <CODE BEGINS> + + /// const FF_FLAGS_NO_LAYOUTCOMMIT = 0x00000001; + /// const FF_FLAGS_NO_IO_THRU_MDS = 0x00000002; + /// const FF_FLAGS_NO_READ_IO = 0x00000004; + /// const FF_FLAGS_WRITE_ONE_MIRROR = 0x00000008; + + /// typedef uint32_t ff_flags4; + /// + + /// struct ff_data_server4 { + /// deviceid4 ffds_deviceid; + /// uint32_t ffds_efficiency; + /// stateid4 ffds_stateid; + /// nfs_fh4 ffds_fh_vers<>; + /// fattr4_owner ffds_user; + /// fattr4_owner_group ffds_group; + /// }; + /// + + /// struct ff_mirror4 { + /// ff_data_server4 ffm_data_servers<>; + /// }; + /// + + /// struct ff_layout4 { + /// length4 ffl_stripe_unit; + /// ff_mirror4 ffl_mirrors<>; + /// ff_flags4 ffl_flags; + /// uint32_t ffl_stats_collect_hint; + + + +Halevy & Haynes Standards Track [Page 19] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + /// }; + /// + + <CODE ENDS> + + The ff_layout4 structure specifies a layout in that portion of the + data file described in the current layout segment. It is either a + single instance or a set of mirrored copies of that portion of the + data file. When mirroring is in effect, it protects against loss of + data in layout segments. + + While not explicitly shown in the above XDR, each layout4 element + returned in the logr_layout array of LAYOUTGET4res (see + Section 18.43.2 of [RFC5661]) describes a layout segment. Hence, + each ff_layout4 also describes a layout segment. It is possible that + the file is concatenated from more than one layout segment. Each + layout segment MAY represent different striping parameters. + + The ffl_stripe_unit field is the stripe unit size in use for the + current layout segment. The number of stripes is given inside each + mirror by the number of elements in ffm_data_servers. If the number + of stripes is one, then the value for ffl_stripe_unit MUST default to + zero. The only supported mapping scheme is sparse and is detailed in + Section 6. Note that there is an assumption here that both the + stripe unit size and the number of stripes are the same across all + mirrors. + + The ffl_mirrors field is the array of mirrored storage devices that + provide the storage for the current stripe; see Figure 1. + + The ffl_stats_collect_hint field provides a hint to the client on how + often the server wants it to report LAYOUTSTATS for a file. The time + is in seconds. + + + + + + + + + + + + + + + + + + +Halevy & Haynes Standards Track [Page 20] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + +-----------+ + | | + | | + | File | + | | + | | + +-----+-----+ + | + +------------+------------+ + | | + +----+-----+ +-----+----+ + | Mirror 1 | | Mirror 2 | + +----+-----+ +-----+----+ + | | + +-----------+ +-----------+ + |+-----------+ |+-----------+ + ||+-----------+ ||+-----------+ + +|| Storage | +|| Storage | + +| Devices | +| Devices | + +-----------+ +-----------+ + + Figure 1 + + The ffs_mirrors field represents an array of state information for + each mirrored copy of the current layout segment. Each element is + described by a ff_mirror4 type. + + ffds_deviceid provides the deviceid of the storage device holding the + data file. + + ffds_fh_vers is an array of filehandles of the data file matching the + available NFS versions on the given storage device. There MUST be + exactly as many elements in ffds_fh_vers as there are in + ffda_versions. Each element of the array corresponds to a particular + combination of ffdv_version, ffdv_minorversion, and + ffdv_tightly_coupled provided for the device. The array allows for + server implementations that have different filehandles for different + combinations of version, minor version, and coupling strength. See + Section 5.4 for how to handle versioning issues between the client + and storage devices. + + For tight coupling, ffds_stateid provides the stateid to be used by + the client to access the file. For loose coupling and an NFSv4 + storage device, the client will have to use an anonymous stateid to + perform I/O on the storage device. With no control protocol, the + metadata server stateid cannot be used to provide a global stateid + model. Thus, the server MUST set the ffds_stateid to be the + anonymous stateid. + + + +Halevy & Haynes Standards Track [Page 21] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + This specification of the ffds_stateid restricts both models for + NFSv4.x storage protocols: + + loosely coupled model: the stateid has to be an anonymous stateid + + tightly coupled model: the stateid has to be a global stateid + + A number of issues stem from a mismatch between the fact that + ffds_stateid is defined as a single item while ffds_fh_vers is + defined as an array. It is possible for each open file on the + storage device to require its own open stateid. Because there are + established loosely coupled implementations of the version of the + protocol described in this document, such potential issues have not + been addressed here. It is possible for future layout types to be + defined that address these issues, should it become important to + provide multiple stateids for the same underlying file. + + For loosely coupled storage devices, ffds_user and ffds_group provide + the synthetic user and group to be used in the RPC credentials that + the client presents to the storage device to access the data files. + For tightly coupled storage devices, the user and group on the + storage device will be the same as on the metadata server; that is, + if ffdv_tightly_coupled (see Section 4.1) is set, then the client + MUST ignore both ffds_user and ffds_group. + + The allowed values for both ffds_user and ffds_group are specified as + owner and owner_group, respectively, in Section 5.9 of [RFC5661]. + For NFSv3 compatibility, user and group strings that consist of + decimal numeric values with no leading zeros can be given a special + interpretation by clients and servers that choose to provide such + support. The receiver may treat such a user or group string as + representing the same user as would be represented by an NFSv3 uid or + gid having the corresponding numeric value. Note that if using + Kerberos for security, the expectation is that these values will be a + name@domain string. + + ffds_efficiency describes the metadata server's evaluation as to the + effectiveness of each mirror. Note that this is per layout and not + per device as the metric may change due to perceived load, + availability to the metadata server, etc. Higher values denote + higher perceived utility. The way the client can select the best + mirror to access is discussed in Section 8.1. + + ffl_flags is a bitmap that allows the metadata server to inform the + client of particular conditions that may result from more or less + tight coupling of the storage devices. + + + + + +Halevy & Haynes Standards Track [Page 22] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + FF_FLAGS_NO_LAYOUTCOMMIT: can be set to indicate that the client is + not required to send LAYOUTCOMMIT to the metadata server. + + FF_FLAGS_NO_IO_THRU_MDS: can be set to indicate that the client + should not send I/O operations to the metadata server. That is, + even if the client could determine that there was a network + disconnect to a storage device, the client should not try to proxy + the I/O through the metadata server. + + FF_FLAGS_NO_READ_IO: can be set to indicate that the client should + not send READ requests with the layouts of iomode + LAYOUTIOMODE4_RW. Instead, it should request a layout of iomode + LAYOUTIOMODE4_READ from the metadata server. + + FF_FLAGS_WRITE_ONE_MIRROR: can be set to indicate that the client + only needs to update one of the mirrors (see Section 8.2). + +5.1.1. Error Codes from LAYOUTGET + + [RFC5661] provides little guidance as to how the client is to proceed + with a LAYOUTGET that returns an error of either + NFS4ERR_LAYOUTTRYLATER, NFS4ERR_LAYOUTUNAVAILABLE, and NFS4ERR_DELAY. + Within the context of this document: + + NFS4ERR_LAYOUTUNAVAILABLE: there is no layout available and the I/O + is to go to the metadata server. Note that it is possible to have + had a layout before a recall and not after. + + NFS4ERR_LAYOUTTRYLATER: there is some issue preventing the layout + from being granted. If the client already has an appropriate + layout, it should continue with I/O to the storage devices. + + NFS4ERR_DELAY: there is some issue preventing the layout from being + granted. If the client already has an appropriate layout, it + should not continue with I/O to the storage devices. + +5.1.2. Client Interactions with FF_FLAGS_NO_IO_THRU_MDS + + Even if the metadata server provides the FF_FLAGS_NO_IO_THRU_MDS + flag, the client can still perform I/O to the metadata server. The + flag functions as a hint. The flag indicates to the client that the + metadata server prefers to separate the metadata I/O from the data I/ + O, most likely for performance reasons. + + + + + + + + +Halevy & Haynes Standards Track [Page 23] + +RFC 8435 pNFS Flexible File Layout August 2018 + + +5.2. LAYOUTCOMMIT + + The flexible file layout does not use lou_body inside the + loca_layoutupdate argument to LAYOUTCOMMIT. If lou_type is + LAYOUT4_FLEX_FILES, the lou_body field MUST have a zero length (see + Section 18.42.1 of [RFC5661]). + +5.3. Interactions between Devices and Layouts + + In [RFC5661], the file layout type is defined such that the + relationship between multipathing and filehandles can result in + either 0, 1, or N filehandles (see Section 13.3). Some rationales + for this are clustered servers that share the same filehandle or + allow for multiple read-only copies of the file on the same storage + device. In the flexible file layout type, while there is an array of + filehandles, they are independent of the multipathing being used. If + the metadata server wants to provide multiple read-only copies of the + same file on the same storage device, then it should provide multiple + mirrored instances, each with a different ff_device_addr4. The + client can then determine that, since the each of the ffds_fh_vers + are different, there are multiple copies of the file for the current + layout segment available. + +5.4. Handling Version Errors + + When the metadata server provides the ffda_versions array in the + ff_device_addr4 (see Section 4.1), the client is able to determine + whether or not it can access a storage device with any of the + supplied combinations of ffdv_version, ffdv_minorversion, and + ffdv_tightly_coupled. However, due to the limitations of reporting + errors in GETDEVICEINFO (see Section 18.40 in [RFC5661]), the client + is not able to specify which specific device it cannot communicate + with over one of the provided ffdv_version and ffdv_minorversion + combinations. Using ff_ioerr4 (see Section 9.1.1) inside either the + LAYOUTRETURN (see Section 18.44 of [RFC5661]) or the LAYOUTERROR (see + Section 15.6 of [RFC7862] and Section 10 of this document), the + client can isolate the problematic storage device. + + The error code to return for LAYOUTRETURN and/or LAYOUTERROR is + NFS4ERR_MINOR_VERS_MISMATCH. It does not matter whether the mismatch + is a major version (e.g., client can use NFSv3 but not NFSv4) or + minor version (e.g., client can use NFSv4.1 but not NFSv4.2), the + error indicates that for all the supplied combinations for + ffdv_version and ffdv_minorversion, the client cannot communicate + with the storage device. The client can retry the GETDEVICEINFO to + see if the metadata server can provide a different combination, or it + can fall back to doing the I/O through the metadata server. + + + + +Halevy & Haynes Standards Track [Page 24] + +RFC 8435 pNFS Flexible File Layout August 2018 + + +6. Striping via Sparse Mapping + + While other layout types support both dense and sparse mapping of + logical offsets to physical offsets within a file (see, for example, + Section 13.4 of [RFC5661]), the flexible file layout type only + supports a sparse mapping. + + With sparse mappings, the logical offset within a file (L) is also + the physical offset on the storage device. As detailed in + Section 13.4.4 of [RFC5661], this results in holes across each + storage device that does not contain the current stripe index. + + L: logical offset within the file + + W: stripe width + W = number of elements in ffm_data_servers + + S: number of bytes in a stripe + S = W * ffl_stripe_unit + + N: stripe number + N = L / S + +7. Recovering from Client I/O Errors + + The pNFS client may encounter errors when directly accessing the + storage devices. However, it is the responsibility of the metadata + server to recover from the I/O errors. When the LAYOUT4_FLEX_FILES + layout type is used, the client MUST report the I/O errors to the + server at LAYOUTRETURN time using the ff_ioerr4 structure (see + Section 9.1.1). + + The metadata server analyzes the error and determines the required + recovery operations such as recovering media failures or + reconstructing missing data files. + + The metadata server MUST recall any outstanding layouts to allow it + exclusive write access to the stripes being recovered and to prevent + other clients from hitting the same error condition. In these cases, + the server MUST complete recovery before handing out any new layouts + to the affected byte ranges. + + Although the client implementation has the option to propagate a + corresponding error to the application that initiated the I/O + operation and drop any unwritten data, the client should attempt to + retry the original I/O operation by either requesting a new layout or + sending the I/O via regular NFSv4.1+ READ or WRITE operations to the + metadata server. The client SHOULD attempt to retrieve a new layout + + + +Halevy & Haynes Standards Track [Page 25] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + and retry the I/O operation using the storage device first and only + retry the I/O operation via the metadata server if the error + persists. + +8. Mirroring + + The flexible file layout type has a simple model in place for the + mirroring of the file data constrained by a layout segment. There is + no assumption that each copy of the mirror is stored identically on + the storage devices. For example, one device might employ + compression or deduplication on the data. However, the over-the-wire + transfer of the file contents MUST appear identical. Note, this is a + constraint of the selected XDR representation in which each mirrored + copy of the layout segment has the same striping pattern (see + Figure 1). + + The metadata server is responsible for determining the number of + mirrored copies and the location of each mirror. While the client + may provide a hint to how many copies it wants (see Section 12), the + metadata server can ignore that hint; in any event, the client has no + means to dictate either the storage device (which also means the + coupling and/or protocol levels to access the layout segments) or the + location of said storage device. + + The updating of mirrored layout segments is done via client-side + mirroring. With this approach, the client is responsible for making + sure modifications are made on all copies of the layout segments it + is informed of via the layout. If a layout segment is being + resilvered to a storage device, that mirrored copy will not be in the + layout. Thus, the metadata server MUST update that copy until the + client is presented it in a layout. If the FF_FLAGS_WRITE_ONE_MIRROR + is set in ffl_flags, the client need only update one of the mirrors + (see Section 8.2). If the client is writing to the layout segments + via the metadata server, then the metadata server MUST update all + copies of the mirror. As seen in Section 8.3, during the + resilvering, the layout is recalled, and the client has to make + modifications via the metadata server. + +8.1. Selecting a Mirror + + When the metadata server grants a layout to a client, it MAY let the + client know how fast it expects each mirror to be once the request + arrives at the storage devices via the ffds_efficiency member. While + the algorithms to calculate that value are left to the metadata + server implementations, factors that could contribute to that + calculation include speed of the storage device, physical memory + available to the device, operating system version, current load, etc. + + + + +Halevy & Haynes Standards Track [Page 26] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + However, what should not be involved in that calculation is a + perceived network distance between the client and the storage device. + The client is better situated for making that determination based on + past interaction with the storage device over the different available + network interfaces between the two; that is, the metadata server + might not know about a transient outage between the client and + storage device because it has no presence on the given subnet. + + As such, it is the client that decides which mirror to access for + reading the file. The requirements for writing to mirrored layout + segments are presented below. + +8.2. Writing to Mirrors + +8.2.1. Single Storage Device Updates Mirrors + + If the FF_FLAGS_WRITE_ONE_MIRROR flag in ffl_flags is set, the client + only needs to update one of the copies of the layout segment. For + this case, the storage device MUST ensure that all copies of the + mirror are updated when any one of the mirrors is updated. If the + storage device gets an error when updating one of the mirrors, then + it MUST inform the client that the original WRITE had an error. The + client then MUST inform the metadata server (see Section 8.2.3). The + client's responsibility with respect to COMMIT is explained in + Section 8.2.4. The client may choose any one of the mirrors and may + use ffds_efficiency as described in Section 8.1 when making this + choice. + +8.2.2. Client Updates All Mirrors + + If the FF_FLAGS_WRITE_ONE_MIRROR flag in ffl_flags is not set, the + client is responsible for updating all mirrored copies of the layout + segments that it is given in the layout. A single failed update is + sufficient to fail the entire operation. If all but one copy is + updated successfully and the last one provides an error, then the + client needs to inform the metadata server about the error. The + client can use either LAYOUTRETURN or LAYOUTERROR to inform the + metadata server that the update failed to that storage device. If + the client is updating the mirrors serially, then it SHOULD stop at + the first error encountered and report that to the metadata server. + If the client is updating the mirrors in parallel, then it SHOULD + wait until all storage devices respond so that it can report all + errors encountered during the update. + + + + + + + + +Halevy & Haynes Standards Track [Page 27] + +RFC 8435 pNFS Flexible File Layout August 2018 + + +8.2.3. Handling Write Errors + + When the client reports a write error to the metadata server, the + metadata server is responsible for determining if it wants to remove + the errant mirror from the layout, if the mirror has recovered from + some transient error, etc. When the client tries to get a new + layout, the metadata server informs it of the decision by the + contents of the layout. The client MUST NOT assume that the contents + of the previous layout will match those of the new one. If it has + updates that were not committed to all mirrors, then it MUST resend + those updates to all mirrors. + + There is no provision in the protocol for the metadata server to + directly determine that the client has or has not recovered from an + error. For example, if a storage device was network partitioned from + the client and the client reported the error to the metadata server, + then the network partition would be repaired, and all of the copies + would be successfully updated. There is no mechanism for the client + to report that fact, and the metadata server is forced to repair the + file across the mirror. + + If the client supports NFSv4.2, it can use LAYOUTERROR and + LAYOUTRETURN to provide hints to the metadata server about the + recovery efforts. A LAYOUTERROR on a file is for a non-fatal error. + A subsequent LAYOUTRETURN without a ff_ioerr4 indicates that the + client successfully replayed the I/O to all mirrors. Any + LAYOUTRETURN with a ff_ioerr4 is an error that the metadata server + needs to repair. The client MUST be prepared for the LAYOUTERROR to + trigger a CB_LAYOUTRECALL if the metadata server determines it needs + to start repairing the file. + +8.2.4. Handling Write COMMITs + + When stable writes are done to the metadata server or to a single + replica (if allowed by the use of FF_FLAGS_WRITE_ONE_MIRROR), it is + the responsibility of the receiving node to propagate the written + data stably, before replying to the client. + + In the corresponding cases in which unstable writes are done, the + receiving node does not have any such obligation, although it may + choose to asynchronously propagate the updates. However, once a + COMMIT is replied to, all replicas must reflect the writes that have + been done, and this data must have been committed to stable storage + on all replicas. + + + + + + + +Halevy & Haynes Standards Track [Page 28] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + In order to avoid situations in which stale data is read from + replicas to which writes have not been propagated: + + o A client that has outstanding unstable writes made to single node + (metadata server or storage device) MUST do all reads from that + same node. + + o When writes are flushed to the server (for example, to implement + close-to-open semantics), a COMMIT must be done by the client to + ensure that up-to-date written data will be available irrespective + of the particular replica read. + +8.3. Metadata Server Resilvering of the File + + The metadata server may elect to create a new mirror of the layout + segments at any time. This might be to resilver a copy on a storage + device that was down for servicing, to provide a copy of the layout + segments on storage with different storage performance + characteristics, etc. As the client will not be aware of the new + mirror and the metadata server will not be aware of updates that the + client is making to the layout segments, the metadata server MUST + recall the writable layout segment(s) that it is resilvering. If the + client issues a LAYOUTGET for a writable layout segment that is in + the process of being resilvered, then the metadata server can deny + that request with an NFS4ERR_LAYOUTUNAVAILABLE. The client would + then have to perform the I/O through the metadata server. + +9. Flexible File Layout Type Return + + layoutreturn_file4 is used in the LAYOUTRETURN operation to convey + layout-type-specific information to the server. It is defined in + Section 18.44.1 of [RFC5661] as follows: + + <CODE BEGINS> + + /* Constants used for LAYOUTRETURN and CB_LAYOUTRECALL */ + const LAYOUT4_RET_REC_FILE = 1; + const LAYOUT4_RET_REC_FSID = 2; + const LAYOUT4_RET_REC_ALL = 3; + + enum layoutreturn_type4 { + LAYOUTRETURN4_FILE = LAYOUT4_RET_REC_FILE, + LAYOUTRETURN4_FSID = LAYOUT4_RET_REC_FSID, + LAYOUTRETURN4_ALL = LAYOUT4_RET_REC_ALL + }; + + struct layoutreturn_file4 { + offset4 lrf_offset; + + + +Halevy & Haynes Standards Track [Page 29] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + length4 lrf_length; + stateid4 lrf_stateid; + /* layouttype4 specific data */ + opaque lrf_body<>; + }; + + union layoutreturn4 switch(layoutreturn_type4 lr_returntype) { + case LAYOUTRETURN4_FILE: + layoutreturn_file4 lr_layout; + default: + void; + }; + + struct LAYOUTRETURN4args { + /* CURRENT_FH: file */ + bool lora_reclaim; + layouttype4 lora_layout_type; + layoutiomode4 lora_iomode; + layoutreturn4 lora_layoutreturn; + }; + + <CODE ENDS> + + If the lora_layout_type layout type is LAYOUT4_FLEX_FILES and the + lr_returntype is LAYOUTRETURN4_FILE, then the lrf_body opaque value + is defined by ff_layoutreturn4 (see Section 9.3). This allows the + client to report I/O error information or layout usage statistics + back to the metadata server as defined below. Note that while the + data structures are built on concepts introduced in NFSv4.2, the + effective discriminated union (lora_layout_type combined with + ff_layoutreturn4) allows for an NFSv4.1 metadata server to utilize + the data. + +9.1. I/O Error Reporting + +9.1.1. ff_ioerr4 + + <CODE BEGINS> + + /// struct ff_ioerr4 { + /// offset4 ffie_offset; + /// length4 ffie_length; + /// stateid4 ffie_stateid; + /// device_error4 ffie_errors<>; + /// }; + /// + + <CODE ENDS> + + + +Halevy & Haynes Standards Track [Page 30] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + Recall that [RFC7862] defines device_error4 as: + + <CODE BEGINS> + + struct device_error4 { + deviceid4 de_deviceid; + nfsstat4 de_status; + nfs_opnum4 de_opnum; + }; + + <CODE ENDS> + + The ff_ioerr4 structure is used to return error indications for data + files that generated errors during data transfers. These are hints + to the metadata server that there are problems with that file. For + each error, ffie_errors.de_deviceid, ffie_offset, and ffie_length + represent the storage device and byte range within the file in which + the error occurred; ffie_errors represents the operation and type of + error. The use of device_error4 is described in Section 15.6 of + [RFC7862]. + + Even though the storage device might be accessed via NFSv3 and + reports back NFSv3 errors to the client, the client is responsible + for mapping these to appropriate NFSv4 status codes as de_status. + Likewise, the NFSv3 operations need to be mapped to equivalent NFSv4 + operations. + +9.2. Layout Usage Statistics + +9.2.1. ff_io_latency4 + + <CODE BEGINS> + + /// struct ff_io_latency4 { + /// uint64_t ffil_ops_requested; + /// uint64_t ffil_bytes_requested; + /// uint64_t ffil_ops_completed; + /// uint64_t ffil_bytes_completed; + /// uint64_t ffil_bytes_not_delivered; + /// nfstime4 ffil_total_busy_time; + /// nfstime4 ffil_aggregate_completion_time; + /// }; + /// + + <CODE ENDS> + + + + + + +Halevy & Haynes Standards Track [Page 31] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + Both operation counts and bytes transferred are kept in the + ff_io_latency4. As seen in ff_layoutupdate4 (see Section 9.2.2), + READ and WRITE operations are aggregated separately. READ operations + are used for the ff_io_latency4 ffl_read. Both WRITE and COMMIT + operations are used for the ff_io_latency4 ffl_write. "Requested" + counters track what the client is attempting to do, and "completed" + counters track what was done. There is no requirement that the + client only report completed results that have matching requested + results from the reported period. + + ffil_bytes_not_delivered is used to track the aggregate number of + bytes requested but not fulfilled due to error conditions. + ffil_total_busy_time is the aggregate time spent with outstanding RPC + calls. ffil_aggregate_completion_time is the sum of all round-trip + times for completed RPC calls. + + In Section 3.3.1 of [RFC5661], the nfstime4 is defined as the number + of seconds and nanoseconds since midnight or zero hour January 1, + 1970 Coordinated Universal Time (UTC). The use of nfstime4 in + ff_io_latency4 is to store time since the start of the first I/O from + the client after receiving the layout. In other words, these are to + be decoded as duration and not as a date and time. + + Note that LAYOUTSTATS are cumulative, i.e., not reset each time the + operation is sent. If two LAYOUTSTATS operations for the same file + and layout stateid originate from the same NFS client and are + processed at the same time by the metadata server, then the one + containing the larger values contains the most recent time series + data. + +9.2.2. ff_layoutupdate4 + + <CODE BEGINS> + + /// struct ff_layoutupdate4 { + /// netaddr4 ffl_addr; + /// nfs_fh4 ffl_fhandle; + /// ff_io_latency4 ffl_read; + /// ff_io_latency4 ffl_write; + /// nfstime4 ffl_duration; + /// bool ffl_local; + /// }; + /// + + <CODE ENDS> + + + + + + +Halevy & Haynes Standards Track [Page 32] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + ffl_addr differentiates which network address the client is connected + to on the storage device. In the case of multipathing, ffl_fhandle + indicates which read-only copy was selected. ffl_read and ffl_write + convey the latencies for both READ and WRITE operations, + respectively. ffl_duration is used to indicate the time period over + which the statistics were collected. If true, ffl_local indicates + that the I/O was serviced by the client's cache. This flag allows + the client to inform the metadata server about "hot" access to a file + it would not normally be allowed to report on. + +9.2.3. ff_iostats4 + + <CODE BEGINS> + + /// struct ff_iostats4 { + /// offset4 ffis_offset; + /// length4 ffis_length; + /// stateid4 ffis_stateid; + /// io_info4 ffis_read; + /// io_info4 ffis_write; + /// deviceid4 ffis_deviceid; + /// ff_layoutupdate4 ffis_layoutupdate; + /// }; + /// + + <CODE ENDS> + + [RFC7862] defines io_info4 as: + + <CODE BEGINS> + + struct io_info4 { + uint64_t ii_count; + uint64_t ii_bytes; + }; + + <CODE ENDS> + + With pNFS, data transfers are performed directly between the pNFS + client and the storage devices. Therefore, the metadata server has + no direct knowledge of the I/O operations being done and thus cannot + create on its own statistical information about client I/O to + optimize the data storage location. ff_iostats4 MAY be used by the + client to report I/O statistics back to the metadata server upon + returning the layout. + + + + + + +Halevy & Haynes Standards Track [Page 33] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + Since it is not feasible for the client to report every I/O that used + the layout, the client MAY identify "hot" byte ranges for which to + report I/O statistics. The definition and/or configuration mechanism + of what is considered "hot" and the size of the reported byte range + are out of the scope of this document. For client implementation, + providing reasonable default values and an optional run-time + management interface to control these parameters is suggested. For + example, a client can define the default byte-range resolution to be + 1 MB in size and the thresholds for reporting to be 1 MB/second or 10 + I/O operations per second. + + For each byte range, ffis_offset and ffis_length represent the + starting offset of the range and the range length in bytes. + ffis_read.ii_count, ffis_read.ii_bytes, ffis_write.ii_count, and + ffis_write.ii_bytes represent the number of contiguous READ and WRITE + I/Os and the respective aggregate number of bytes transferred within + the reported byte range. + + The combination of ffis_deviceid and ffl_addr uniquely identifies + both the storage path and the network route to it. Finally, + ffl_fhandle allows the metadata server to differentiate between + multiple read-only copies of the file on the same storage device. + +9.3. ff_layoutreturn4 + + <CODE BEGINS> + + /// struct ff_layoutreturn4 { + /// ff_ioerr4 fflr_ioerr_report<>; + /// ff_iostats4 fflr_iostats_report<>; + /// }; + /// + + <CODE ENDS> + + When data file I/O operations fail, fflr_ioerr_report<> is used to + report these errors to the metadata server as an array of elements of + type ff_ioerr4. Each element in the array represents an error that + occurred on the data file identified by ffie_errors.de_deviceid. If + no errors are to be reported, the size of the fflr_ioerr_report<> + array is set to zero. The client MAY also use fflr_iostats_report<> + to report a list of I/O statistics as an array of elements of type + ff_iostats4. Each element in the array represents statistics for a + particular byte range. Byte ranges are not guaranteed to be disjoint + and MAY repeat or intersect. + + + + + + +Halevy & Haynes Standards Track [Page 34] + +RFC 8435 pNFS Flexible File Layout August 2018 + + +10. Flexible File Layout Type LAYOUTERROR + + If the client is using NFSv4.2 to communicate with the metadata + server, then instead of waiting for a LAYOUTRETURN to send error + information to the metadata server (see Section 9.1), it MAY use + LAYOUTERROR (see Section 15.6 of [RFC7862]) to communicate that + information. For the flexible file layout type, this means that + LAYOUTERROR4args is treated the same as ff_ioerr4. + +11. Flexible File Layout Type LAYOUTSTATS + + If the client is using NFSv4.2 to communicate with the metadata + server, then instead of waiting for a LAYOUTRETURN to send I/O + statistics to the metadata server (see Section 9.2), it MAY use + LAYOUTSTATS (see Section 15.7 of [RFC7862]) to communicate that + information. For the flexible file layout type, this means that + LAYOUTSTATS4args.lsa_layoutupdate is overloaded with the same + contents as in ffis_layoutupdate. + +12. Flexible File Layout Type Creation Hint + + The layouthint4 type is defined in the [RFC5661] as follows: + + <CODE BEGINS> + + struct layouthint4 { + layouttype4 loh_type; + opaque loh_body<>; + }; + + <CODE ENDS> + + The layouthint4 structure is used by the client to pass a hint about + the type of layout it would like created for a particular file. If + the loh_type layout type is LAYOUT4_FLEX_FILES, then the loh_body + opaque value is defined by the ff_layouthint4 type. + +12.1. ff_layouthint4 + + <CODE BEGINS> + + /// union ff_mirrors_hint switch (bool ffmc_valid) { + /// case TRUE: + /// uint32_t ffmc_mirrors; + /// case FALSE: + /// void; + /// }; + /// + + + +Halevy & Haynes Standards Track [Page 35] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + /// struct ff_layouthint4 { + /// ff_mirrors_hint fflh_mirrors_hint; + /// }; + /// + + <CODE ENDS> + + This type conveys hints for the desired data map. All parameters are + optional so the client can give values for only the parameter it + cares about. + +13. Recalling a Layout + + While Section 12.5.5 of [RFC5661] discusses reasons independent of + layout type for recalling a layout, the flexible file layout type + metadata server should recall outstanding layouts in the following + cases: + + o When the file's security policy changes, i.e., ACLs or permission + mode bits are set. + + o When the file's layout changes, rendering outstanding layouts + invalid. + + o When existing layouts are inconsistent with the need to enforce + locking constraints. + + o When existing layouts are inconsistent with the requirements + regarding resilvering as described in Section 8.3. + +13.1. CB_RECALL_ANY + + The metadata server can use the CB_RECALL_ANY callback operation to + notify the client to return some or all of its layouts. Section 22.3 + of [RFC5661] defines the allowed types of the "NFSv4 Recallable + Object Types Registry". + + <CODE BEGINS> + + /// const RCA4_TYPE_MASK_FF_LAYOUT_MIN = 16; + /// const RCA4_TYPE_MASK_FF_LAYOUT_MAX = 17; + /// + + struct CB_RECALL_ANY4args { + uint32_t craa_layouts_to_keep; + bitmap4 craa_type_mask; + }; + + + + +Halevy & Haynes Standards Track [Page 36] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + <CODE ENDS> + + Typically, CB_RECALL_ANY will be used to recall client state when the + server needs to reclaim resources. The craa_type_mask bitmap + specifies the type of resources that are recalled, and the + craa_layouts_to_keep value specifies how many of the recalled + flexible file layouts the client is allowed to keep. The mask flags + for the flexible file layout type are defined as follows: + + <CODE BEGINS> + + /// enum ff_cb_recall_any_mask { + /// PNFS_FF_RCA4_TYPE_MASK_READ = 16, + /// PNFS_FF_RCA4_TYPE_MASK_RW = 17 + /// }; + /// + + <CODE ENDS> + + The flags represent the iomode of the recalled layouts. In response, + the client SHOULD return layouts of the recalled iomode that it needs + the least, keeping at most craa_layouts_to_keep flexible file + layouts. + + The PNFS_FF_RCA4_TYPE_MASK_READ flag notifies the client to return + layouts of iomode LAYOUTIOMODE4_READ. Similarly, the + PNFS_FF_RCA4_TYPE_MASK_RW flag notifies the client to return layouts + of iomode LAYOUTIOMODE4_RW. When both mask flags are set, the client + is notified to return layouts of either iomode. + +14. Client Fencing + + In cases where clients are uncommunicative and their lease has + expired or when clients fail to return recalled layouts within a + lease period, the server MAY revoke client layouts and reassign these + resources to other clients (see Section 12.5.5 of [RFC5661]). To + avoid data corruption, the metadata server MUST fence off the revoked + clients from the respective data files as described in Section 2.2. + +15. Security Considerations + + The combination of components in a pNFS system is required to + preserve the security properties of NFSv4.1+ with respect to an + entity accessing data via a client. The pNFS feature partitions the + NFSv4.1+ file system protocol into two parts: the control protocol + and the data protocol. As the control protocol in this document is + NFS, the security properties are equivalent to the version of NFS + being used. The flexible file layout further divides the data + + + +Halevy & Haynes Standards Track [Page 37] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + protocol into metadata and data paths. The security properties of + the metadata path are equivalent to those of NFSv4.1x (see Sections + 1.7.1 and 2.2.1 of [RFC5661]). And the security properties of the + data path are equivalent to those of the version of NFS used to + access the storage device, with the provision that the metadata + server is responsible for authenticating client access to the data + file. The metadata server provides appropriate credentials to the + client to access data files on the storage device. It is also + responsible for revoking access for a client to the storage device. + + The metadata server enforces the file access control policy at + LAYOUTGET time. The client should use RPC authorization credentials + for getting the layout for the requested iomode ((LAYOUTIOMODE4_READ + or LAYOUTIOMODE4_RW), and the server verifies the permissions and ACL + for these credentials, possibly returning NFS4ERR_ACCESS if the + client is not allowed the requested iomode. If the LAYOUTGET + operation succeeds, the client receives, as part of the layout, a set + of credentials allowing it I/O access to the specified data files + corresponding to the requested iomode. When the client acts on I/O + operations on behalf of its local users, it MUST authenticate and + authorize the user by issuing respective OPEN and ACCESS calls to the + metadata server, similar to having NFSv4 data delegations. + + The combination of filehandle, synthetic uid, and gid in the layout + is the way that the metadata server enforces access control to the + data server. The client only has access to filehandles of file + objects and not directory objects. Thus, given a filehandle in a + layout, it is not possible to guess the parent directory filehandle. + Further, as the data file permissions only allow the given synthetic + uid read/write permission and the given synthetic gid read + permission, knowing the synthetic ids of one file does not + necessarily allow access to any other data file on the storage + device. + + The metadata server can also deny access at any time by fencing the + data file, which means changing the synthetic ids. In turn, that + forces the client to return its current layout and get a new layout + if it wants to continue I/O to the data file. + + If access is allowed, the client uses the corresponding (read-only or + read/write) credentials to perform the I/O operations at the data + file's storage devices. When the metadata server receives a request + to change a file's permissions or ACL, it SHOULD recall all layouts + for that file and then MUST fence off any clients still holding + outstanding layouts for the respective files by implicitly + invalidating the previously distributed credential on all data file + comprising the file in question. It is REQUIRED that this be done + before committing to the new permissions and/or ACL. By requesting + + + +Halevy & Haynes Standards Track [Page 38] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + new layouts, the clients will reauthorize access against the modified + access control metadata. Recalling the layouts in this case is + intended to prevent clients from getting an error on I/Os done after + the client was fenced off. + +15.1. RPCSEC_GSS and Security Services + + Because of the special use of principals within the loosely coupled + model, the issues are different depending on the coupling model. + +15.1.1. Loosely Coupled + + RPCSEC_GSS version 3 (RPCSEC_GSSv3) [RFC7861] contains facilities + that would allow it to be used to authorize the client to the storage + device on behalf of the metadata server. Doing so would require that + each of the metadata server, storage device, and client would need to + implement RPCSEC_GSSv3 using an RPC-application-defined structured + privilege assertion in a manner described in Section 4.9.1 of + [RFC7862]. The specifics necessary to do so are not described in + this document. This is principally because any such specification + would require extensive implementation work on a wide range of + storage devices, which would be unlikely to result in a widely usable + specification for a considerable time. + + As a result, the layout type described in this document will not + provide support for use of RPCSEC_GSS together with the loosely + coupled model. However, future layout types could be specified, + which would allow such support, either through the use of + RPCSEC_GSSv3 or in other ways. + +15.1.2. Tightly Coupled + + With tight coupling, the principal used to access the metadata file + is exactly the same as used to access the data file. The storage + device can use the control protocol to validate any RPC credentials. + As a result, there are no security issues related to using RPCSEC_GSS + with a tightly coupled system. For example, if Kerberos V5 Generic + Security Service Application Program Interface (GSS-API) [RFC4121] is + used as the security mechanism, then the storage device could use a + control protocol to validate the RPC credentials to the metadata + server. + +16. IANA Considerations + + [RFC5661] introduced the "pNFS Layout Types Registry"; new layout + type numbers in this registry need to be assigned by IANA. This + document defines the protocol associated with an existing layout type + number: LAYOUT4_FLEX_FILES. See Table 1. + + + +Halevy & Haynes Standards Track [Page 39] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + +--------------------+------------+----------+-----+----------------+ + | Layout Type Name | Value | RFC | How | Minor Versions | + +--------------------+------------+----------+-----+----------------+ + | LAYOUT4_FLEX_FILES | 0x00000004 | RFC 8435 | L | 1 | + +--------------------+------------+----------+-----+----------------+ + + Table 1: Layout Type Assignments + + [RFC5661] also introduced the "NFSv4 Recallable Object Types + Registry". This document defines new recallable objects for + RCA4_TYPE_MASK_FF_LAYOUT_MIN and RCA4_TYPE_MASK_FF_LAYOUT_MAX (see + Table 2). + + +------------------------------+-------+--------+-----+-------------+ + | Recallable Object Type Name | Value | RFC | How | Minor | + | | | | | Versions | + +------------------------------+-------+--------+-----+-------------+ + | RCA4_TYPE_MASK_FF_LAYOUT_MIN | 16 | RFC | L | 1 | + | | | 8435 | | | + | RCA4_TYPE_MASK_FF_LAYOUT_MAX | 17 | RFC | L | 1 | + | | | 8435 | | | + +------------------------------+-------+--------+-----+-------------+ + + Table 2: Recallable Object Type Assignments + +17. References + +17.1. Normative References + + [LEGAL] IETF Trust, "Trust Legal Provisions (TLP)", + <https://trustee.ietf.org/trust-legal-provisions.html>. + + [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS + Version 3 Protocol Specification", RFC 1813, + DOI 10.17487/RFC1813, June 1995, + <https://www.rfc-editor.org/info/rfc1813>. + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, + DOI 10.17487/RFC2119, March 1997, + <https://www.rfc-editor.org/info/rfc2119>. + + [RFC4121] Zhu, L., Jaganathan, K., and S. Hartman, "The Kerberos + Version 5 Generic Security Service Application Program + Interface (GSS-API) Mechanism: Version 2", RFC 4121, + DOI 10.17487/RFC4121, July 2005, + <https://www.rfc-editor.org/info/rfc4121>. + + + + +Halevy & Haynes Standards Track [Page 40] + +RFC 8435 pNFS Flexible File Layout August 2018 + + + [RFC4506] Eisler, M., Ed., "XDR: External Data Representation + Standard", STD 67, RFC 4506, DOI 10.17487/RFC4506, May + 2006, <https://www.rfc-editor.org/info/rfc4506>. + + [RFC5531] Thurlow, R., "RPC: Remote Procedure Call Protocol + Specification Version 2", RFC 5531, DOI 10.17487/RFC5531, + May 2009, <https://www.rfc-editor.org/info/rfc5531>. + + [RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., + "Network File System (NFS) Version 4 Minor Version 1 + Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010, + <https://www.rfc-editor.org/info/rfc5661>. + + [RFC5662] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., + "Network File System (NFS) Version 4 Minor Version 1 + External Data Representation Standard (XDR) Description", + RFC 5662, DOI 10.17487/RFC5662, January 2010, + <https://www.rfc-editor.org/info/rfc5662>. + + [RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System + (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, + March 2015, <https://www.rfc-editor.org/info/rfc7530>. + + [RFC7861] Adamson, A. and N. Williams, "Remote Procedure Call (RPC) + Security Version 3", RFC 7861, DOI 10.17487/RFC7861, + November 2016, <https://www.rfc-editor.org/info/rfc7861>. + + [RFC7862] Haynes, T., "Network File System (NFS) Version 4 Minor + Version 2 Protocol", RFC 7862, DOI 10.17487/RFC7862, + November 2016, <https://www.rfc-editor.org/info/rfc7862>. + + [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC + 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, + May 2017, <https://www.rfc-editor.org/info/rfc8174>. + + [RFC8434] Haynes, T., "Requirements for Parallel NFS (pNFS) Layout + Types", RFC 8434, DOI 10.17487/RFC8434, August 2018, + <https://www.rfc-editor.org/info/rfc8434>. + +17.2. Informative References + + [RFC4519] Sciberras, A., Ed., "Lightweight Directory Access Protocol + (LDAP): Schema for User Applications", RFC 4519, + DOI 10.17487/RFC4519, June 2006, + <https://www.rfc-editor.org/info/rfc4519>. + + + + + + +Halevy & Haynes Standards Track [Page 41] + +RFC 8435 pNFS Flexible File Layout August 2018 + + +Acknowledgments + + The following individuals provided miscellaneous comments to early + draft versions of this document: Matt W. Benjamin, Adam Emerson, + J. Bruce Fields, and Lev Solomonov. + + The following individuals provided miscellaneous comments to the + final draft versions of this document: Anand Ganesh, Robert Wipfel, + Gobikrishnan Sundharraj, Trond Myklebust, Rick Macklem, and Jim + Sermersheim. + + Idan Kedar caught a nasty bug in the interaction of client-side + mirroring and the minor versioning of devices. + + Dave Noveck provided comprehensive reviews of the document during the + working group last calls. He also rewrote Section 2.3. + + Olga Kornievskaia made a convincing case against the use of a + credential versus a principal in the fencing approach. Andy Adamson + and Benjamin Kaduk helped to sharpen the focus. + + Benjamin Kaduk and Olga Kornievskaia also helped provide concrete + scenarios for loosely coupled security mechanisms. In the end, Olga + proved that as defined, the loosely coupled model would not work with + RPCSEC_GSS. + + Tigran Mkrtchyan provided the use case for not allowing the client to + proxy the I/O through the data server. + + Rick Macklem provided the use case for only writing to a single + mirror. + +Authors' Addresses + + Benny Halevy + + Email: bhalevy@gmail.com + + + Thomas Haynes + Hammerspace + 4300 El Camino Real Ste 105 + Los Altos, CA 94022 + United States of America + + Email: loghyr@gmail.com + + + + + +Halevy & Haynes Standards Track [Page 42] + |