diff options
Diffstat (limited to 'doc/rfc/rfc9277.txt')
-rw-r--r-- | doc/rfc/rfc9277.txt | 960 |
1 files changed, 960 insertions, 0 deletions
diff --git a/doc/rfc/rfc9277.txt b/doc/rfc/rfc9277.txt new file mode 100644 index 0000000..1e625db --- /dev/null +++ b/doc/rfc/rfc9277.txt @@ -0,0 +1,960 @@ + + + + +Internet Engineering Task Force (IETF) M. Richardson +Request for Comments: 9277 Sandelman Software Works +Category: Standards Track C. Bormann +ISSN: 2070-1721 Universität Bremen TZI + August 2022 + + + On Stable Storage for Items in Concise Binary Object Representation + (CBOR) + +Abstract + + This document defines a stored ("file") format for Concise Binary + Object Representation (CBOR) data items that is friendly to common + systems that recognize file types, such as the Unix file(1) command. + +Status of This Memo + + This is an Internet Standards Track document. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Further information on + Internet Standards is available in Section 2 of RFC 7841. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + https://www.rfc-editor.org/info/rfc9277. + +Copyright Notice + + Copyright (c) 2022 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Revised BSD License text as described in Section 4.e of the + Trust Legal Provisions and are provided without warranty as described + in the Revised BSD License. + +Table of Contents + + 1. Introduction + 1.1. Terminology + 1.2. Requirements for a Magic Number + 2. Protocol + 2.1. The CBOR-Protocol-Specific Tag + 2.2. Enveloping Method: CBOR Tag Wrapped + 2.2.1. Example + 2.3. Enveloping Method: Labeled CBOR Sequence + 2.3.1. Example + 3. Security Considerations + 4. IANA Considerations + 4.1. Labeled CBOR Sequence Tag + 4.2. CBOR-Labeled Non-CBOR Data Tag + 4.3. CBOR Tags for CoAP Content-Format Numbers + 5. References + 5.1. Normative References + 5.2. Informative References + Appendix A. Advice to Protocol Designer + A.1. Is the on-wire format new? + A.2. Can many items be trivially concatenated? + A.3. Are there tags at the start? + Appendix B. CBOR Tags for CoAP Content Formats + B.1. Content-Format Tag Examples + Appendix C. Example from Openswan + Appendix D. Using CBOR Labels for Non-CBOR Data + D.1. Content-Format Tag Examples + Acknowledgements + Contributors + Authors' Addresses + +1. Introduction + + Since very early in computing, operating systems have sought ways to + mark which files could be processed by which programs. In Unix, + everything is a stream of bytes; identifying the contents of a stream + of bytes became a heuristic activity. + + For instance, the Unix file(1) command, which has existed since 1973 + [FILE], has been able to identify many file formats based upon the + contents of the file for decades. + + Many systems (Linux, macOS, Windows) will select the correct + application based upon the file contents if the system cannot + determine it by other means. For instance, in classical Mac OS, a + resource fork was maintained separately from the file data that + included file type information; this way, the OS ideally never needed + to know anything about the file data contents to determine the media + type. + + Many other systems do this by using file extensions. Many common Web + servers derive the media-type information from file extensions. + + Having a media type associated with the file contents can avoid some + of the brittleness of this approach. When files become disconnected + from their type information, such as when attempting to do forensics + on a damaged system, being able to identify the type of information + stored in a file can become very important. + + A common way to identify the type of a file from its contents is to + place a "magic number" at the start of the file contents [MAGIC]. In + the media type registration template [RFC6838], a magic number is + asked for, if available, as is a file extension. + + A challenge for the file(1) command is often that it can be confused + by recognizing the overall encoding but not the content being + encoded. For instance, an Android Package Kit APK (as used to + transfer and store an application) may be identified as a ZIP file. + Additionally, both OpenOffice and MSOffice files are ZIP files of XML + files; the identification may stop at identifying them as ZIP files. + + As CBOR becomes a more and more common encoding for a wide variety of + artifacts, identifying them as just "CBOR" is probably not + sufficient. This document provides a way to encode a magic number + into the beginning of a CBOR format file. As a CBOR format may use a + single CBOR data item or a CBOR sequence of data items [RFC8742], two + possible methods of enveloping data are presented; a CBOR Protocol + designer will specify one. (A CBOR Protocol is a specification that + uses CBOR as its encoding.) + + This document also gives advice to designers of CBOR Protocols on + choosing one of these mechanisms for identifying their contents. + This advice is informative. + + A third method is also proposed by which a CBOR format tag is + prepended to identify non-CBOR files. Further information on this + method appears in Appendix D because it is not about identifying + media types containing CBOR-encoded data items. This includes a + simple way to derive a magic number for content-formats as defined in + [RFC7252], even if the file is not in CBOR form. + + Examples of CBOR Protocols currently under development include + Concise Software Identification Tags [CoSWID] and Entity Attestation + Tokens [EAT]. CBOR Object Signing and Encryption (COSE) itself + [RFC8152] is considered infrastructure. The encoding of public keys + in CBOR as _C509_ as described in [C509-CERT] would benefit from + being an identified CBOR Protocol. + + A major inspiration for this document is observing the disarray in + certain ASN.1-based systems where most files are Privacy-Enhanced + Mail (PEM) encoded; these files are all identified by the extension + "pem", which confounds public keys, private keys, certificate + requests, and S/MIME content. + + While the envelopes defined in this specification add information to + how data conforming to CBOR Protocols are stored in files, there is + no requirement that either type of envelope be transferred on the + wire. However, there are some protocols that may benefit from having + such a magic number on the wire if they are presently using a + different (legacy) encoding scheme. The presence of the identifiable + magic sequence can be used to signal that a CBOR Protocol is being + used as opposed to a legacy scheme. + +1.1. Terminology + + Byte is a synonym for octet. The term "byte string" refers to the + data item defined in [STD94]. + + The term "file" is understood to stand in a general way for a stored + representation that is somewhat detached from the original context of + usage of that representation; its usage in this document encompasses + similar units of storage that may have different identification + schemes such as partitions or media blocks. + + The term "diagnostic notation" refers to the human-readable notation + for CBOR data items defined in Section 8 of [STD94] and Appendix G of + [RFC8610]. + + The term "CDDL" (Concise Data Definition Language) refers to the + language defined in [RFC8610]. + + The function TN(ct) is defined in Appendix B. + +1.2. Requirements for a Magic Number + + Ideally, a magic number is a fingerprint that is unique to a specific + CBOR Protocol, is present in the first few (small multiple of 4) + bytes of the file and does not change when the contents change, and + does not depend upon the length of the file. + + Less ideal solutions have a pattern that needs to be matched, but in + which some bytes need to be ignored. While the Unix file(1) command + can be told to ignore certain bytes, this can lead to ambiguities. + +2. Protocol + + This section presents two enveloping methods. Both use CBOR tags in + a way that results in a deterministic first 8 to 12 bytes. The + Protocol designer determines which one to use; see Appendix A for + some guidance. + +2.1. The CBOR-Protocol-Specific Tag + + In both enveloping methods, CBOR Protocol designers need to obtain a + CBOR tag for each kind of object that they might store in files. As + there are more than 4 billion available 4-byte tags, there should be + little issue in allocating a few to each available CBOR Protocol. + + The IANA policy for 4-byte CBOR tags is First Come First Served + [RFC8126] so only a simple interaction (e.g., via Web or email) with + IANA is required. The interaction includes filling in the small + template provided in Section 9.2 of [STD94]. In the template, a + reference to this specification (RFC 9277) alongside the Description + of semantics is suggested. + + Allocation of the CBOR tag needs to be initiated by the designer of + the CBOR Protocol, who can provide a proposed tag number. In order + to be in the 4-byte range, and so that there are no leading zero + bytes in the 4-byte encoding of the tag number, the value needs to be + in the range 0x01000000 (decimal 16777216) to 0xFFFFFFFF (decimal + 4294967295) inclusive. It is further suggested to avoid values that + have an embedded zero byte in the 4 bytes of their binary + representation (such as 0x12003456), as these may confuse + implementations that treat the magic number as a C string. + + The use of a sequence of four ASCII [RFC20] codes which are mnemonic + to the protocol is encouraged, but not required (there may be reasons + to encode other information into the tag; see Appendix B for an + example). For instance, Appendix C uses "OPSN", which translates to + the tag number 1330664270 registered for it. + + In [IANA.CORE-PARAMETERS], the Constrained Application Protocol + (CoAP) defines the "CoAP Content-Formats" registry to assign Content- + Format Numbers (Section 12.3 of [RFC7252]) to Content Types in a + specific Content Coding. For CBOR data items that form a + representation that is already described by such a Content-Format + Number, a tag number has proactively been allocated in Section 4.3 + (see Appendix B for details and examples). + +2.2. Enveloping Method: CBOR Tag Wrapped + + The CBOR Tag Wrapped method is appropriate for use with CBOR + Protocols that encode a single CBOR data item. This data item is + enveloped into two nested tags: + + The outer tag is a self-described CBOR tag, 55799, as described in + Section 3.4.6 of [STD94]. + + The tag content of the outer tag is a second CBOR tag whose tag + number has been allocated to describe the specific Protocol involved, + as discussed in Section 2.1. The tag content of this inner tag is + the single CBOR data item. + + This method wraps the CBOR data item as CBOR tags usually do. + Applications that need to send the stored CBOR data item across a + constrained network may wish to remove the two tags if the type is + understood from the protocol context, e.g., from a CoAP Content- + Format Option (Section 5.10.3 of [RFC7252]). Therefore, a CBOR + Protocol specification may pick the specific cases where the CBOR Tag + Wrapped enveloping method is to be used. For instance, it might + specify its use for storing the representation in a local file or for + Web access, but not within protocol messages that already provide the + necessary context. + +2.2.1. Example + + To construct an example without registering a new tag, we use the + Content-Format ID assigned for application/senml+cbor (112) [RFC8428] + of the "CoAP Content-Formats" registry [IANA.CORE-PARAMETERS]). + + Using the technique described in Appendix B, this translates into the + tag TN(112) = 1668546929. + + With this tag, the SenML-CBOR pack [{0: "current", 6: 3, 2: 1.5}] + would be enveloped as follows (in diagnostic notation): + + 55799(1668546929([{0: "current", 6: 3, 2: 1.5}])) + + Or in hex: + + d9 d9f7 # tag(55799) + da 63740171 # tag(1668546929) + 81 # array(1) + a3 # map(3) + 00 # unsigned(0) + 67 # text(7) + 63757272656e74 # "current" + 06 # unsigned(6) + 03 # unsigned(3) + 02 # unsigned(2) + f9 3e00 # primitive(15872) + + At the representation level, the unique fingerprint for application/ + senml+cbor is composed of the 8 bytes d9d9f7da63740171 hex, after + which the unadorned CBOR data (81... for the SenML data) is appended. + +2.3. Enveloping Method: Labeled CBOR Sequence + + The Labeled CBOR Sequence method is appropriate for use with CBOR + Sequences as described in [RFC8742]. + + This method prepends a newly constructed, separate data item to the + CBOR Sequence, the _label_. + + The label is a nesting of two tags, similar to but distinct from the + CBOR Tag Wrapped methods, with an inner tag content of a constant + byte string. The total length of the label is 12 bytes. + + 1. The outer tag is the self-described CBOR Sequence tag, 55800. + + 2. The inner tag is a CBOR tag from the First Come First Served + space that uniquely identifies the CBOR Protocol. As with the + CBOR Tag Wrapped method, the use of a 4-byte tag that encodes + without zero bytes is encouraged. + + 3. The tag content is a 3-byte CBOR byte string containing + 0x42_4f_52 ('BOR' in diagnostic notation). + + The outer tag in the label identifies the file as being a CBOR + Sequence and does so with all the desirable properties explained in + Section 3.4.6 of [STD94]. Specifically, it does not appear to + conflict with any known file types, and it is not valid Unicode in + any Unicode encoding. + + The inner tag in the label identifies which CBOR Protocol is used, as + described above. + + The inner tag content is a constant byte string that is represented + as 0x43_42_4f_52, the ASCII characters "CBOR", which is the CBOR- + encoded data item for the 3-byte string 0x42_4f_52 ('BOR' in + diagnostic notation). + + The actual CBOR Protocol data then follows as the next data item(s) + in the CBOR Sequence, without a need for any further specific tag. + The use of a CBOR Sequence allows the application to trivially remove + the first item with the two tags. + + Should this file be reviewed by a human (directly in an editor or in + a hexdump display), it will include the ASCII characters "CBOR" + prominently. This value is also included simply because the inner + nested tag needs to tag something. + +2.3.1. Example + + To construct an example without registering a new tag, we use ID 272 + as assigned for application/missing-blocks+cbor-seq of the "CoAP + Content-Formats" registry [RFC9177]. + + Using the technique described in Appendix B, this translates into the + tag TN(272) = 1668547090. + + This is a somewhat contrived example, as this is not a media type + that is likely to be committed to storage. Nonetheless, with this + tag, missing blocks list 0, 8, 15 would be enveloped as (in + diagnostic notation): + + 55800(1668547090('BOR')), + 0, + 8, + 15 + + Or in hex: + + # CBOR sequence with 4 elements + d9 d9f8 # tag(55800) + da 63740212 # tag(1668547090) + 43 # bytes(3) + 424f52 # "BOR" + 00 # unsigned(0) + 08 # unsigned(8) + 0f # unsigned(15) + + At the representation level, the unique fingerprint for application/ + missing-blocks+cbor-seq is composed of the 8 bytes d9d9f8da63740212 + hex, after which the unadorned CBOR sequence (00... for the missing + block list given) is appended. + +3. Security Considerations + + This document provides a way to identify CBOR Protocol objects. + Clearly identifying CBOR contents in files may have a variety of + impacts. + + The most obvious is that it may allow malware to identify interesting + stored objects, and then exfiltrate or corrupt them. + + Protective applications (that check data) cannot rely on the + applications they try to protect (that use the data) to make exactly + the same decisions in recognizing file formats. (This is an instance + of a check versus use issue.) For example, end-point assessment + technologies should not solely rely on the labeling approaches + described in this document to decide whether to inspect a given file. + Similarly, depending on operating system configurations and related + properties of the execution environment, the labeling might influence + the default application used to process a file in a way that may not + be predicted by a protective application. + +4. IANA Considerations + + These IANA considerations are entirely about CBOR tags in the + "Concise Binary Object Representation (CBOR) Tags" registry + [IANA.CBOR-TAGS]. + + Section 4.1 documents the allocation for a CBOR tag to be used in a + CBOR sequence to identify the sequence (an example for using this tag + is found in Appendix C). Section 4.2 documents the allocation for a + CBOR tag to be used in the CBOR-Labeled Non-CBOR Data Enveloping + Method (Appendix D, which also shows examples). Section 4.3 + allocates a CBOR tag for each actual or potential CoAP Content-Format + number (examples are in Appendix B). + +4.1. Labeled CBOR Sequence Tag + + IANA has allocated tag 55800 for the Labeled CBOR Sequence Enveloping + Method from the "CBOR Tags" registry. IANA has updated this tag + registration to point to this document. + + This tag is from the First Come First Served area. + + The value has been picked to have properties similar to the 55799 tag + (Section 3.4.6 of [STD94]). + + The hexadecimal representation of the encoded tag head is 0xd9_d9_f8. + + This is not valid UTF-8: the first 0xd9 introduces a 3-byte sequence + in UTF-8, but the 0xd9 as the second value is not a valid second byte + for UTF-8. + + This is not valid UTF-16: the byte sequence 0xd9d9 (in either endian + order) puts this value into the UTF-16 high-half zone, which would + signal that this is a 32-bit Unicode value. However, the following + 16-bit big-endian value 0xf8_xx is not a valid second sequence + according to [RFC2781]. On a little-endian system, it would be + necessary to examine the fourth byte to determine if it is valid. + That next byte is determined by the subsequent encoding, and + Section 3.4.6 of [STD94] has already determined that no valid CBOR + encodings result in valid UTF-16. + + Data Item: + tagged byte string + + Semantics: + indicates that the file contains CBOR Sequences + +4.2. CBOR-Labeled Non-CBOR Data Tag + + IANA has allocated tag 55801 for the CBOR-Labeled Non-CBOR Data + Enveloping Method (Appendix D) from the "CBOR Tags" registry. IANA + updated this tag registration to point to this document. + + This tag is from the First Come First Served area. + + The value has been picked to have properties similar to the 55799 tag + (Section 3.4.6 of [STD94]). + + The hexadecimal representation of the encoded tag head is 0xd9_d9_f9. + + This is not valid UTF-8: the first 0xd9 introduces a 3-byte sequence + in UTF-8, but the 0xd9 as the second value is not a valid second byte + for UTF-8. + + This is not valid UTF-16: the byte sequence 0xd9d9 (in either endian + order) puts this value into the UTF-16 high-half zone, which would + signal that this is a 32-bit Unicode value. However, the following + 16-bit big-endian value 0xf9_xx is not a valid second sequence + according to [RFC2781]. On a little-endian system, it would be + necessary to examine the fourth byte to determine if it is valid. + That next byte is determined by the subsequent encoding, and + Section 3.4.6 of [STD94] has already determined that no valid CBOR + encodings result in valid UTF-16. + + Data Item: + tagged byte string + + Semantics: + indicates that the file starts with a CBOR-Labeled Non-CBOR Data + label. + +4.3. CBOR Tags for CoAP Content-Format Numbers + + IANA allocated the tag numbers 1668546817 (0x63740101) to 1668612095 + (0x6374ffff) as follows: + + Data Item: + byte string or any CBOR data item (see Appendix B) + + Semantics: + the representation of content-format ct < 65025 is indicated by + tag number + TN(ct) = 0x63740101 + (ct / 255) * 256 + ct % 255 + + Reference: + RFC 9277 + + The "CoAP Content-Formats" registry [IANA.CORE-PARAMETERS] is defined + in Section 12.3 of [RFC7252]. + +5. References + +5.1. Normative References + + [C] International Organization for Standardization, + "Information technology -- Programming languages -- C", + ISO/IEC 9899:2018, Fourth Edition, June 2018, + <https://www.iso.org/standard/74528.html>. + + [RFC8742] Bormann, C., "Concise Binary Object Representation (CBOR) + Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020, + <https://www.rfc-editor.org/info/rfc8742>. + + [STD94] Bormann, C. and P. Hoffman, "Concise Binary Object + Representation (CBOR)", STD 94, RFC 8949, + DOI 10.17487/RFC8949, December 2020, + <https://www.rfc-editor.org/info/std94>. + +5.2. Informative References + + [C509-CERT] + Mattsson, J. P., Selander, G., Raza, S., Höglund, J., and + M. Furuhed, "CBOR Encoded X.509 Certificates (C509 + Certificates)", Work in Progress, Internet-Draft, draft- + ietf-cose-cbor-encoded-cert-04, 10 July 2022, + <https://datatracker.ietf.org/doc/html/draft-ietf-cose- + cbor-encoded-cert-04>. + + [CoSWID] Birkholz, H., Fitzgerald-McKay, J., Schmidt, C., and D. + Waltermire, "Concise Software Identification Tags", Work + in Progress, Internet-Draft, draft-ietf-sacm-coswid-22, 20 + July 2022, <https://datatracker.ietf.org/doc/html/draft- + ietf-sacm-coswid-22>. + + [EAT] Lundblade, L., Mandyam, G., and J. O'Donoghue, "The Entity + Attestation Token (EAT)", Work in Progress, Internet- + Draft, draft-ietf-rats-eat-14, 10 July 2022, + <https://datatracker.ietf.org/doc/html/draft-ietf-rats- + eat-14>. + + [FILE] Wikipedia, "file (command)", 2 July 2022, + <https://en.wikipedia.org/w/ + index.php?title=File_(command)&oldid=1096086462>. + + [IANA.CBOR-TAGS] + IANA, "Concise Binary Object Representation (CBOR) Tags", + <https://www.iana.org/assignments/cbor-tags>. + + [IANA.CORE-PARAMETERS] + IANA, "Constrained RESTful Environments (CoRE) + Parameters", + <https://www.iana.org/assignments/core-parameters>. + + [MAGIC] Bell Labs, "archive (library) file format", Unix + Programmer's Manual, First Edition: File Formats, 3 + November 1971, + <https://www.bell-labs.com/usr/dmr/www/man51.pdf#page=4>. + + [RFC20] Cerf, V., "ASCII format for network interchange", STD 80, + RFC 20, DOI 10.17487/RFC0020, October 1969, + <https://www.rfc-editor.org/info/rfc20>. + + [RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO + 10646", RFC 2781, DOI 10.17487/RFC2781, February 2000, + <https://www.rfc-editor.org/info/rfc2781>. + + [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type + Specifications and Registration Procedures", BCP 13, + RFC 6838, DOI 10.17487/RFC6838, January 2013, + <https://www.rfc-editor.org/info/rfc6838>. + + [RFC7252] Shelby, Z., Hartke, K., and C. Bormann, "The Constrained + Application Protocol (CoAP)", RFC 7252, + DOI 10.17487/RFC7252, June 2014, + <https://www.rfc-editor.org/info/rfc7252>. + + [RFC8017] Moriarty, K., Ed., Kaliski, B., Jonsson, J., and A. Rusch, + "PKCS #1: RSA Cryptography Specifications Version 2.2", + RFC 8017, DOI 10.17487/RFC8017, November 2016, + <https://www.rfc-editor.org/info/rfc8017>. + + [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for + Writing an IANA Considerations Section in RFCs", BCP 26, + RFC 8126, DOI 10.17487/RFC8126, June 2017, + <https://www.rfc-editor.org/info/rfc8126>. + + [RFC8152] Schaad, J., "CBOR Object Signing and Encryption (COSE)", + RFC 8152, DOI 10.17487/RFC8152, July 2017, + <https://www.rfc-editor.org/info/rfc8152>. + + [RFC8428] Jennings, C., Shelby, Z., Arkko, J., Keranen, A., and C. + Bormann, "Sensor Measurement Lists (SenML)", RFC 8428, + DOI 10.17487/RFC8428, August 2018, + <https://www.rfc-editor.org/info/rfc8428>. + + [RFC8610] Birkholz, H., Vigano, C., and C. Bormann, "Concise Data + Definition Language (CDDL): A Notational Convention to + Express Concise Binary Object Representation (CBOR) and + JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, + June 2019, <https://www.rfc-editor.org/info/rfc8610>. + + [RFC9110] Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke, + Ed., "HTTP Semantics", STD 97, RFC 9110, + DOI 10.17487/RFC9110, June 2022, + <https://www.rfc-editor.org/info/rfc9110>. + + [RFC9177] Boucadair, M. and J. Shallow, "Constrained Application + Protocol (CoAP) Block-Wise Transfer Options Supporting + Robust Transmission", RFC 9177, DOI 10.17487/RFC9177, + March 2022, <https://www.rfc-editor.org/info/rfc9177>. + + [X.690] ITU-T, "Information technology - ASN.1 encoding rules: + Specification of Basic Encoding Rules (BER), Canonical + Encoding Rules (CER) and Distinguished Encoding Rules + (DER)", ITU-T Recommendation X.690, ISO/IEC 8825-1, + February 2021, <https://www.itu.int/rec/T-REC-X.690>. + +Appendix A. Advice to Protocol Designer + + This document introduces a choice between wrapping a single CBOR data + item into a pair of identifying CBOR tags or prepending an + identifying encoded CBOR data item (which, in turn, contains a pair + of identifying CBOR tags) to a CBOR Sequence (which might be a single + data item). + + Which should a protocol designer use? + + In this discussion, one assumes that there is an object stored in a + file, perhaps specified by a system operator in a configuration file. + + For example: a private key used in COSE operations, a public key/ + certificate in C509 [C509-CERT] or CBOR format, a recorded sensor + reading stored for later transmission, or a COVID-19 vaccination + certificate that needs to be displayed in QR code form. + + Both the Labeled CBOR Sequence and the wrapped tag can be trivially + removed by an application before sending the CBOR content out on the + wire. + + The Labeled CBOR Sequence can be slightly easier to remove as, in + most cases, CBOR parsers will return it as a unit and then return the + actual CBOR item, which could be anything at all and could include + CBOR tags that _do_ need to be sent on the wire. + + On the other hand, having the Labeled CBOR Sequence in the file + requires that all programs that expect to examine that file be able + to skip what appears to be a CBOR item with two tags nested around a + 3-byte byte string. The 3-byte entry is not of the format the + program would normally have processed, so it may be a surprise. On + the other hand, CBOR parsers are generally tolerant of tags that + appear: many of them will process extra tags, making unknown tags + available as meta information. A program that is not expecting those + tags may just ignore them. + + As an example of where there was a problem with previous security + systems, "PEM" format certificate files grew to be able to contain + multiple certificates by simple concatenation. The PKCS1 format + [RFC8017] could also contain a private key object followed by one or + more certificate objects, but only when in PEM format. Annoyingly, + when in binary DER format ([X.690], which like CBOR is self- + delimiting), concatenation of certificates was not compatible with + most programs as they did not expect to read more than one item in + the file. + + The use of CBOR Tag Wrapped format is easier to retrofit to an + existing format with existing and unchangeable stored format for a + single CBOR data item. This new sequence of tags is expected to be + trivially ignored by many existing programs when reading CBOR from + files or similar units of storage, even if the program only supports + decoding a single data item (and not a CBOR sequence). But, a naive + program might also then transmit the additional tags across the + network. Removing the CBOR Tag Wrapped format requires knowledge of + the two tags involved. Other tags present might be needed. + + For a representation matching a specific media-type that is carried + in a CBOR byte string, the byte string head will already have to be + removed for use as such a representation, so it should be easy to + remove the enclosing tag heads as well. This is of particular + interest with the predefined tags provided in Appendix B for media + types with CoAP Content-Format numbers. + + Some considerations in the form of survey questions follow. + +A.1. Is the on-wire format new? + + If the on-wire format is new, then it could be specified with the + CBOR Tag Wrapped format if the extra 8 bytes are not a problem. The + stored format is then identical to the on-wire format. + + If the 8 bytes are a problem on the wire (and they often are if CBOR + is being considered), then the Labeled CBOR Sequence format should be + adopted for the stored format. + +A.2. Can many items be trivially concatenated? + + If the programs that read the contents of the file already expect to + process all of the CBOR data items in the file (not just the first), + then the Labeled CBOR Sequence format may be easily retrofitted. + + The programs involved may throw errors or warnings on the Labeled + CBOR Sequence if they have not yet been updated, but this may not be + a problem. + + There are situations where multiple objects may be concatenated into + a single file. If each object is preceded by a Labeled CBOR Sequence + label, then there may be multiple such labels in the file. + + A protocol based on CBOR Sequences may specify that Labeled CBOR + Sequence labels can occur within a CBOR Sequence, possibly even to + switch to data items following in the sequence that are of a + different type. + + If the CBOR-Sequence-based protocol does not define the semantics for + or at least tolerate embedded labels, care must be taken when + concatenating Labeled CBOR Sequences to remove the label from all but + the first part. + + | As an example from legacy PEM-encoded PKIX certificates, many + | programs accept a series of PKIX certificates in a single file + | in order to set up a certificate chain. The file would contain + | not just the End-Entity (EE) certificate, but also any + | subordinate certification authorities (CAs) needed to validate + | the EE. This mechanism actually only works for PEM-encoded + | certificates, and not DER-encoded certificates. One of the + | reasons for this specification is to make sure that CBOR- + | encoded certificates do not suffer from this problem. + | + | As an example of mixing of types, some TLS server programs also + | can accept both their PEM-encoded private key and their PEM- + | encoded certificate in the same file. + + If only one item is ever expected in the file, the use of the Labeled + CBOR Sequence may present an implementation hurdle to programs that + previously just read a single data item and used it. + +A.3. Are there tags at the start? + + If the Protocol expects to use other tags at its top level, then the + use of the CBOR Tag Wrapped format may be easy to explain at the same + place in the protocol description. + +Appendix B. CBOR Tags for CoAP Content Formats + + Section 5.10.3 of [RFC7252] defines the concept of a Content-Format, + which is a short, 16-bit unsigned integer that identifies a specific + content type (media type plus (optionally) parameters), optionally + together with a content coding (see Section 8.4.1 of [RFC9110]). + + Outside of a transfer protocol that indicates the Content-Format for + a representation, it may be necessary to identify the Content-Format + of the representation when it is stored in a file, in firmware, or + when debugging. + + This specification allocates CBOR tag numbers 1668546817 (0x63740101) + to 1668612095 (0x6374FFFF) for the tagging of representations of + specific content formats. + + Using tags from this range, a byte string that is to be interpreted + as a representation of Content-Format number ct, with ct < 65025 + (255*255), can be identified by enclosing it in a tag with tag number + TN(ct) where: + + TN(ct) = 0x63740101 + (ct / 255) * 256 + ct % 255. + + (where +, *, / and % stand for integer addition, multiplication, + division, and remainder as in the programming language C [C].) + + | This formula avoids the use of zero bytes in the representation + | of the tag number. + | + | Note that no tag numbers are assigned for Content-Format + | numbers in the following range: + | + | 65025 ≤ ct ≤ 65535 + | + | (This range is in the range reserved for Experimental Use + | [RFC8126] by Section 12.3 of [RFC7252]. The overlap of 25 code + | points between this experimental range with the range this + | appendix defines tag numbers for can be used for experiments + | that want to employ a tag number.) + + Exceptionally, when used immediately as tag content of one of the + tags 55799, 55800, or 55801, the tag content is as follows: + + Tag 55799 (Section 2.2): One of: + + 1. The CBOR data item within the representation (without byte- + string wrapping). This only works for Content-Formats that + are represented by a single CBOR data item in identity + content-coding. + + 2. The data items in the CBOR sequence within the representation, + without byte string wrapping, but wrapped in a CBOR array. + This works for Content-Formats that are represented by a CBOR + sequence in identity content-coding. + + Tags 55800 (Section 2.3) or 55801 (Appendix D): the byte string + 'BOR', signifying that the representation of the given content- + format follows in the file, in the way defined for these tags. + +B.1. Content-Format Tag Examples + + The "CoAP Content-Formats" registry [IANA.CORE-PARAMETERS] defines + content formats that can be used as examples: + + * As discussed in Section 2.2.1, Content-Format ID 112 represents + the application/senml+cbor media type (no parameters). The + corresponding tag number is TN(112) = 1668546929. + + The following CDDL snippet can be used to identify application/ + senml+cbor representations: + + senml-cbor = #6.1668546929(bstr) + + Note that a byte string is used as the type of the tag content + because a media type representation in general can be any byte + string. + + * Content-Format ID 272 represents the application/missing- + blocks+cbor-seq media type, which is a CBOR sequence [RFC9177]. + + The corresponding tag number is TN(272) = 1668547090. + + The following CDDL snippet can be used to identify application/ + missing-blocks+cbor-seq representations as embedded in a CBOR byte + string: + + missing-blocks = #6.1668547090(bstr) + +Appendix C. Example from Openswan + + The Openswan IPsec project has a daemon ("pluto") and two control + programs ("addconn" and "whack"). They communicate via a Unix-domain + socket, over which a C-structure containing pointers to strings is + serialized using a bespoke mechanism. This is normally not a problem + as the structure is compiled by the same compiler; but when there are + upgrades, it is possible for the daemon and the control programs to + get out of sync by the bespoke serialization. As a result, there are + extra compensations to deal with shutting the daemon down. During + testing, it is sometimes the case that upgrades are backed out. + + In addition, when doing unit testing, the easiest way to load policy + is to use the normal policy-reading process, but that is not normally + loaded in the daemon. Instead, the IPC that is normally sent across + the wire is compiled, serialized, and placed in a file. The above + magic number is included in the file and on the IPC in order to + distinguish the "shutdown" command CBOR operation. + + In order to reduce the problems due to serialization, the + serialization is being changed to CBOR. Additionally, this change + allows the IPC to be described by CDDL and any implementation + language to be used that can encode CBOR. + + IANA has allocated the tag 1330664270 or 0x4f_50_53_4e for this + purpose. As a result, each file and each IPC is prefixed with a CBOR + Sequence tag. + + In diagnostic notation: + + 55800(1330664270(h'424F52')) + + Or in hex: + + d9 d9f8 # tag(55800) + da 4f50534e # tag(1330664270) + 43 # bytes(3) + 424f52 # "BOR" + +Appendix D. Using CBOR Labels for Non-CBOR Data + + The CBOR-Labeled Non-CBOR data method is appropriate for adding a + magic number to a Non-CBOR data format, particularly one that can be + described by a Content-Format tag (Appendix B). + + This method prepends a CBOR data item to the Non-CBOR data; this data + item is called the "header" and, similar to the Labeled CBOR-Sequence + label, consists of two nested tags around a constant byte string for + a total of 12 bytes. + + 1. The outer tag is the CBOR-Labeled Non-CBOR Data tag, 55801. + + 2. The inner tag is a CBOR tag from the First Come First Served + space that uniquely identifies the CBOR Protocol. As with CBOR + Tag Wrapped, the use of a 4-byte tag is encouraged that encodes + without zero bytes. + + 3. The tag content is a 3-byte CBOR byte string containing + 0x42_4F_52 ('BOR' in diagnostic notation). + + The outer tag in the label identifies the file as being prefixed by a + Non-CBOR data label and does so with all the desirable properties + explained in Section 3.4.6 of [STD94]. Specifically, it does not + appear to conflict with any known file types, and it is not valid + Unicode in any Unicode encoding. + + The inner tag in the label identifies which Non-CBOR Protocol is + used. + + The inner tag content is a constant byte string that is represented + as 0x43_42_4f_52, the ASCII characters "CBOR", which is the CBOR- + encoded data item for the 3-byte string 0x42_4f_52 ('BOR' in + diagnostic notation). + + The actual Non-CBOR Protocol data then follow directly appended to + the CBOR representation of the header. This allows the application + to trivially remove the header item with the two nested tags and the + byte string. + + As with the Labeled CBOR Sequence {#sequences}, this choice of the + tag content places the ASCII characters "CBOR" prominently into the + header. + +D.1. Content-Format Tag Examples + + The "CoAP Content-Formats" registry [IANA.CORE-PARAMETERS] defines + content formats that can be used as examples: + + * Content-Format ID 432 represents the application/td+json media + type (no parameters). + + The corresponding tag number is TN(432) = 1668547250. + + The following CDDL snippet can be used to identify a CBOR-Labeled + Non-CBOR data for application/td+json representations: + + td-json-header = #6.55801(#6.1668547250('BOR')) + + * Content-Format 11050 represents the application/json media type in + deflate content-coding. + + The corresponding tag number is TN(11050) = 1668557910. + + The following CDDL snippet can be used to identify a CBOR-Labeled + Non-CBOR data for application/json representations compressed in + deflate content-coding: + + json-deflate-header = #6.55801(#6.1668557910('BOR')) + +Acknowledgements + + The CBOR WG brainstormed this protocol on January 20, 2021 via a + number of productive email exchanges on the mailing list. + +Contributors + + Josef 'Jeff' Sipek + Email: jeffpc@josefsipek.net + + +Authors' Addresses + + Michael Richardson + Sandelman Software Works + Email: mcr+ietf@sandelman.ca + + + Carsten Bormann + Universität Bremen TZI + Postfach 330440 + D-28359 Bremen + Germany + Phone: +49-421-218-63921 + Email: cabo@tzi.org |