diff options
author | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
---|---|---|
committer | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
commit | 4bfd864f10b68b71482b35c818559068ef8d5797 (patch) | |
tree | e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc8746.txt | |
parent | ea76e11061bda059ae9f9ad130a9895cc85607db (diff) |
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc8746.txt')
-rw-r--r-- | doc/rfc/rfc8746.txt | 665 |
1 files changed, 665 insertions, 0 deletions
diff --git a/doc/rfc/rfc8746.txt b/doc/rfc/rfc8746.txt new file mode 100644 index 0000000..6b31034 --- /dev/null +++ b/doc/rfc/rfc8746.txt @@ -0,0 +1,665 @@ + + + + +Internet Engineering Task Force (IETF) C. Bormann, Ed. +Request for Comments: 8746 Universität Bremen TZI +Category: Standards Track February 2020 +ISSN: 2070-1721 + + + Concise Binary Object Representation (CBOR) Tags for Typed Arrays + +Abstract + + The Concise Binary Object Representation (CBOR), as defined in RFC + 7049, is a data format whose design goals include the possibility of + extremely small code size, fairly small message size, and + extensibility without the need for version negotiation. + + This document makes use of this extensibility to define a number of + CBOR tags for typed arrays of numeric data, as well as additional + tags for multi-dimensional and homogeneous arrays. It is intended as + the reference document for the IANA registration of the CBOR tags + defined. + +Status of This Memo + + This is an Internet Standards Track document. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Further information on + Internet Standards is available in Section 2 of RFC 7841. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + https://www.rfc-editor.org/info/rfc8746. + +Copyright Notice + + Copyright (c) 2020 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Simplified BSD License text as described in Section 4.e of + the Trust Legal Provisions and are provided without warranty as + described in the Simplified BSD License. + +Table of Contents + + 1. Introduction + 1.1. Terminology + 2. Typed Arrays + 2.1. Types of Numbers + 3. Additional Array Tags + 3.1. Multi-dimensional Array + 3.1.1. Row-Major Order + 3.1.2. Column-Major Order + 3.2. Homogeneous Array + 4. Discussion + 5. CDDL Typenames + 6. IANA Considerations + 7. Security Considerations + 8. References + 8.1. Normative References + 8.2. Informative References + Acknowledgements + Contributors + Author's Address + +1. Introduction + + The Concise Binary Object Representation (CBOR) [RFC7049] provides + for the interchange of structured data without a requirement for a + pre-agreed schema. [RFC7049] defines a basic set of data types as + well as a tagging mechanism that enables extending the set of data + types supported via an IANA registry. + + Recently, a simple form of typed arrays of numeric data has received + interest both in the Web graphics community [TypedArray] and in the + JavaScript specification (see Section 22.2 (https://www.ecma- + international.org/ecma-262/10.0/index.html#sec-typedarray-objects) of + [ECMA-ES10]) as well as in corresponding implementations + [ArrayBuffer]. + + Since these typed arrays may carry significant amounts of data, there + is interest in interchanging them in CBOR without the need of lengthy + conversion of each number in the array. This can also save space + overhead with encoding a type for each element of an array. + + This document defines a number of interrelated CBOR tags that cover + these typed arrays, as well as additional tags for multi-dimensional + and homogeneous arrays. It is intended as the reference document for + the IANA registration of the tags defined. + + Note that an application that generates CBOR with these tags has + considerable freedom in choosing variants (e.g., with respect to + endianness, embedded type (signed vs. unsigned), and number of bits + per element) or whether a tag defined in this specification is used + at all instead of more basic CBOR. In contrast to representation + variants of single CBOR numbers, there is no representation that + could be identified as "preferred". If deterministic encoding is + desired in a CBOR-based protocol making use of these tags, the + protocol has to define which of the encoding variants are used for + each individual case. + +1.1. Terminology + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and + "OPTIONAL" in this document are to be interpreted as described in + BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all + capitals, as shown here. + + The term "byte" is used in its now-customary sense as a synonym for + "octet". Where bit arithmetic is explained, this document uses + familiar notation from the programming language C [C] (including + C++14's 0bnnn binary literals [CPlusPlus]) with the exception of the + operator "**", which stands for exponentiation. + + The term "array" is used in a general sense in this document unless + further specified. The term "classical CBOR array" describes an + array represented with CBOR major type 4. A "homogeneous array" is + an array of elements that are all the same type (the term is neutral + as to whether that is a representation type or an application data + model type). + + The terms "big endian" and "little endian" are used to indicate a + most significant byte first (MSB first) representation of integers + and a least significant byte first (LSB first) representation, + respectively. + +2. Typed Arrays + + Typed arrays are homogeneous arrays of numbers, all of which are + encoded in a single form of binary representation. The concatenation + of these representations is encoded as a single CBOR byte string + (major type 2), enclosed by a single tag indicating the type and + encoding of all the numbers represented in the byte string. + +2.1. Types of Numbers + + Three classes of numbers are of interest: unsigned integers (uint), + signed integers (two's complement, sint), and IEEE 754 binary + floating point numbers (which are always signed). For each of these + classes, there are multiple representation lengths in active use: + + +-----------+--------+--------+-----------+ + | Length ll | uint | sint | float | + +===========+========+========+===========+ + | 0 | uint8 | sint8 | binary16 | + +-----------+--------+--------+-----------+ + | 1 | uint16 | sint16 | binary32 | + +-----------+--------+--------+-----------+ + | 2 | uint32 | sint32 | binary64 | + +-----------+--------+--------+-----------+ + | 3 | uint64 | sint64 | binary128 | + +-----------+--------+--------+-----------+ + + Table 1: Length Values + + Here, sintN stands for a signed integer of exactly N bits (for + instance, sint16), and uintN stands for an unsigned integer of + exactly N bits (for instance, uint32). The name binaryN stands for + the number form of the same name defined in IEEE 754 [IEEE754]. + + Since one objective of these tags is to be able to directly ship the + ArrayBuffers underlying the Typed Arrays without re-encoding them, + and these may be either in big-endian (network byte order) or in + little-endian form, we need to define tags for both variants. + + In total, this leads to 24 variants. In the tag, we need to express + the choice between integer and floating point, the signedness (for + integers), the endianness, and one of the four length values. + + In order to simplify implementation, a range of tags is being + allocated that allows retrieving all this information from the bits + of the tag: tag values from 64 to 87. + + The value is split up into 5 bit fields: 0b010, f, s, e, and ll as + detailed in Table 2. + + +-------+-------------------------------------------------------+ + | Field | Use | + +=======+=======================================================+ + | 0b010 | the constant bits 0, 1, 0 | + +-------+-------------------------------------------------------+ + | f | 0 for integer, 1 for float | + +-------+-------------------------------------------------------+ + | s | 0 for float or unsigned integer, 1 for signed integer | + +-------+-------------------------------------------------------+ + | e | 0 for big endian, 1 for little endian | + +-------+-------------------------------------------------------+ + | ll | A number for the length (Table 1). | + +-------+-------------------------------------------------------+ + + Table 2: Bit Fields in the Low 8 Bits of the Tag + + The number of bytes in each array element can then be calculated by + "2**(f + ll)" (or "1 << (f + ll)" in a typical programming language). + (Notice that 0f and ll are the two least significant bits, + respectively, of each 4-bit nibble in the byte.) + + In the CBOR representation, the total number of elements in the array + is not expressed explicitly but is implied from the length of the + byte string and the length of each representation. It can be + computed from the length, in bytes, of the byte string comprising the + representation of the array by inverting the previous formula: + "bytelength >> (f + ll)". + + For the uint8/sint8 values, the endianness is redundant. Only the + tag for the big-endian variant is used and assigned as such. The tag + that would signify the little-endian variant of sint8 MUST NOT be + used; its tag number is marked as reserved. As a special case, the + tag that would signify the little-endian variant of uint8 is instead + assigned to signify that the numbers in the array are using clamped + conversion from integers, as described in more detail in + Section 7.1.11 (http://www.ecma-international.org/ecma-262/6.0/#sec- + touint8clamp) of the ES10 JavaScript specification ("ToUint8Clamp") + [ECMA-ES10]; the assumption here is that a program-internal + representation of this array after decoding would be marked this way + for further processing providing "roundtripping" of JavaScript-typed + arrays through CBOR. + + IEEE 754 binary floating numbers are always signed. Therefore, for + the float variants ("f" == 1), there is no need to distinguish + between signed and unsigned variants; the "s" bit is always zero. + The tag numbers where "s" would be one (which would have tag values + 88 to 95) remain free to use by other specifications. + +3. Additional Array Tags + + This specification defines three additional array tags. The Multi- + dimensional Array tags can be combined with classical CBOR arrays as + well as with Typed Arrays in order to build multi-dimensional arrays + with constant numbers of elements in the sub-arrays. The Homogeneous + Array tag can be used as a signal by an application to identify a + classical CBOR array as a homogeneous array, even when a Typed Array + does not apply. + +3.1. Multi-dimensional Array + + A multi-dimensional array is represented as a tagged array that + contains two (one-dimensional) arrays. The first array defines the + dimensions of the multi-dimensional array (in the sequence of outer + dimensions towards inner dimensions) while the second array + represents the contents of the multi-dimensional array. If the + second array is itself tagged as a Typed Array, then the element type + of the multi-dimensional array is known to be the same type as that + of the Typed Array. + + Two tags are defined by this document: one for elements arranged in + row-major order and another for column-major order [RowColMajor]. + +3.1.1. Row-Major Order + + Tag: 40 + + Data Item: Array (major type 4) of two arrays: one array (major type + 4) of dimensions, which are unsigned integers distinct from zero; + and one array (any one of a CBOR array of major type 4, a Typed + Array, or a Homogeneous Array) of elements. + + Data in the second array consists of consecutive values where the + last dimension is considered contiguous (row-major order). + + Figure 1 shows a declaration of a two-dimensional array in the C + language, a representation of that in CBOR using both a multi- + dimensional array tag and a typed array tag. + + uint16_t a[2][3] = { + {2, 4, 8}, /* row 0 */ + {4, 16, 256}, + }; + + <Tag 40> # multi-dimensional array tag + 82 # array(2) + 82 # array(2) + 02 # unsigned(2) 1st Dimension + 03 # unsigned(3) 2nd Dimension + <Tag 65> # uint16 array + 4c # byte string(12) + 0002 # unsigned(2) + 0004 # unsigned(4) + 0008 # unsigned(8) + 0004 # unsigned(4) + 0010 # unsigned(16) + 0100 # unsigned(256) + + Figure 1: Multi-dimensional Array in C and CBOR + + Figure 2 shows the same two-dimensional array using the multi- + dimensional array tag in conjunction with a basic CBOR array (which, + with the small numbers chosen for the example, happens to be + shorter). + + <Tag 40> # multi-dimensional array tag + 82 # array(2) + 82 # array(2) + 02 # unsigned(2) 1st Dimension + 03 # unsigned(3) 2nd Dimension + 86 # array(6) + 02 # unsigned(2) + 04 # unsigned(4) + 08 # unsigned(8) + 04 # unsigned(4) + 10 # unsigned(16) + 19 0100 # unsigned(256) + + Figure 2: Multi-dimensional Array Using Basic CBOR Array + +3.1.2. Column-Major Order + + The multi-dimensional arrays specified in the previous sub-subsection + are in "row major" order, which is the preferred order for the + purposes of this specification. An analogous representation that + uses "column major" order arrays is provided in this subsection under + the tag 1040, as illustrated in Figure 3. + + Tag: 1040 + + Data Item: The same as tag 40, except the data in the second array + consists of consecutive values where the first dimension is + considered contiguous (column-major order). + + <Tag 1040> # multi-dimensional array tag, column-major order + 82 # array(2) + 82 # array(2) + 02 # unsigned(2) 1st Dimension + 03 # unsigned(3) 2nd Dimension + 86 # array(6) + 02 # unsigned(2) + 04 # unsigned(4) + 04 # unsigned(4) + 10 # unsigned(16) + 08 # unsigned(8) + 19 0100 # unsigned(256) + + Figure 3: Multi-dimensional Array Using Basic CBOR Array, Column- + Major Order + +3.2. Homogeneous Array + + Tag: 41 + + Data Item: Array (major type 4) + + This tag identifies the classical CBOR array (a one-dimensional + array) tagged by it as a homogeneous array, that is, it has elements + that are all of the same application model data type. The element + type of the array is therefore determined by the application model + data type of the first array element. + + This can be used in application data models that apply specific + semantics to homogeneous arrays. Also, in certain cases, + implementations in strongly typed languages may be able to create + native homogeneous arrays of specific types instead of ordered lists + while decoding. Which CBOR data items constitute elements of the + same application type is specific to the application. + + Figure 4 shows an example for a homogeneous array of booleans in C++ + [CPlusPlus] and CBOR. + + bool boolArray[2] = { true, false }; + + <Tag 41> # Homogeneous Array Tag + 82 #array(2) + F5 # true + F4 # false + + Figure 4: Homogeneous Array in C++ and CBOR + + Figure 5 extends the example with a more complex structure. + + typedef struct { + bool active; + int value; + } foo; + foo myArray[2] = { {true, 3}, {true, -4} }; + + <Tag 41> + 82 # array(2) + 82 # array(2) + F5 # true + 03 # 3 + 82 # array(2) + F5 # true + 23 # -4 + + Figure 5: Homogeneous Array in C++ and CBOR + +4. Discussion + + Support for both little- and big-endian representation may seem out + of character with CBOR, which is otherwise fully big endian. This + support is in line with the intended use of the typed arrays and the + objective not to require conversion of each array element. + + This specification allocates a sizable chunk out of the single-byte + tag space. This use of code point space is justified by the wide use + of typed arrays in data interchange. + + Providing a column-major order variant of the multi-dimensional array + may seem superfluous to some and useful to others. It is cheap to + define the additional tag so that it is available when actually + needed. Allocating it out of a different number space makes the + preference for row-major evident. + + Applying a Homogeneous Array tag to a Typed Array would usually be + redundant and is therefore not provided by the present specification. + + +5. CDDL Typenames + + For use with CDDL [RFC8610], the typenames defined in Figure 6 are + recommended: + + ta-uint8 = #6.64(bstr) + ta-uint16be = #6.65(bstr) + ta-uint32be = #6.66(bstr) + ta-uint64be = #6.67(bstr) + ta-uint8-clamped = #6.68(bstr) + ta-uint16le = #6.69(bstr) + ta-uint32le = #6.70(bstr) + ta-uint64le = #6.71(bstr) + ta-sint8 = #6.72(bstr) + ta-sint16be = #6.73(bstr) + ta-sint32be = #6.74(bstr) + ta-sint64be = #6.75(bstr) + ; reserved: #6.76(bstr) + ta-sint16le = #6.77(bstr) + ta-sint32le = #6.78(bstr) + ta-sint64le = #6.79(bstr) + ta-float16be = #6.80(bstr) + ta-float32be = #6.81(bstr) + ta-float64be = #6.82(bstr) + ta-float128be = #6.83(bstr) + ta-float16le = #6.84(bstr) + ta-float32le = #6.85(bstr) + ta-float64le = #6.86(bstr) + ta-float128le = #6.87(bstr) + homogeneous<array> = #6.41(array) + multi-dim<dim, array> = #6.40([dim, array]) + multi-dim-column-major<dim, array> = #6.1040([dim, array]) + + Figure 6: Recommended Typenames for CDDL + + +6. IANA Considerations + + IANA has allocated the tags in Table 3 using this document as the + specification reference. (The reserved value is for a future + revision of typed array tags.) + + The allocations were assigned from the "specification required" space + (24..255) with the exception of 1040, which was assigned from the + "first come first served" space (256..). + + +------+----------------------+----------------------------+ + | Tag | Data Item | Semantics | + +======+======================+============================+ + | 40 | array of two arrays* | Multi-dimensional Array, | + | | | row-major order | + +------+----------------------+----------------------------+ + | 41 | array | Homogeneous Array | + +------+----------------------+----------------------------+ + | 64 | byte string | uint8 Typed Array | + +------+----------------------+----------------------------+ + | 65 | byte string | uint16, big endian, Typed | + | | | Array | + +------+----------------------+----------------------------+ + | 66 | byte string | uint32, big endian, Typed | + | | | Array | + +------+----------------------+----------------------------+ + | 67 | byte string | uint64, big endian, Typed | + | | | Array | + +------+----------------------+----------------------------+ + | 68 | byte string | uint8 Typed Array, clamped | + | | | arithmetic | + +------+----------------------+----------------------------+ + | 69 | byte string | uint16, little endian, | + | | | Typed Array | + +------+----------------------+----------------------------+ + | 70 | byte string | uint32, little endian, | + | | | Typed Array | + +------+----------------------+----------------------------+ + | 71 | byte string | uint64, little endian, | + | | | Typed Array | + +------+----------------------+----------------------------+ + | 72 | byte string | sint8 Typed Array | + +------+----------------------+----------------------------+ + | 73 | byte string | sint16, big endian, Typed | + | | | Array | + +------+----------------------+----------------------------+ + | 74 | byte string | sint32, big endian, Typed | + | | | Array | + +------+----------------------+----------------------------+ + | 75 | byte string | sint64, big endian, Typed | + | | | Array | + +------+----------------------+----------------------------+ + | 76 | byte string | (reserved) | + +------+----------------------+----------------------------+ + | 77 | byte string | sint16, little endian, | + | | | Typed Array | + +------+----------------------+----------------------------+ + | 78 | byte string | sint32, little endian, | + | | | Typed Array | + +------+----------------------+----------------------------+ + | 79 | byte string | sint64, little endian, | + | | | Typed Array | + +------+----------------------+----------------------------+ + | 80 | byte string | IEEE 754 binary16, big | + | | | endian, Typed Array | + +------+----------------------+----------------------------+ + | 81 | byte string | IEEE 754 binary32, big | + | | | endian, Typed Array | + +------+----------------------+----------------------------+ + | 82 | byte string | IEEE 754 binary64, big | + | | | endian, Typed Array | + +------+----------------------+----------------------------+ + | 83 | byte string | IEEE 754 binary128, big | + | | | endian, Typed Array | + +------+----------------------+----------------------------+ + | 84 | byte string | IEEE 754 binary16, little | + | | | endian, Typed Array | + +------+----------------------+----------------------------+ + | 85 | byte string | IEEE 754 binary32, little | + | | | endian, Typed Array | + +------+----------------------+----------------------------+ + | 86 | byte string | IEEE 754 binary64, little | + | | | endian, Typed Array | + +------+----------------------+----------------------------+ + | 87 | byte string | IEEE 754 binary128, little | + | | | endian, Typed Array | + +------+----------------------+----------------------------+ + | 1040 | array of two arrays* | Multi-dimensional Array, | + | | | column-major order | + +------+----------------------+----------------------------+ + + Table 3: Values for Tags + + *40 or 1040 data item: The second element of the outer array in the + data item is a native CBOR array (major type 4) or Typed Array (one + of tag 64..87) + +7. Security Considerations + + The security considerations of [RFC7049] apply; special attention is + drawn to the second paragraph of Section 8 of [RFC7049]. + + The tag for homogeneous arrays makes a promise about its tagged data + item, which a maliciously constructed CBOR input can then choose to + ignore. As always, the decoder therefore has to ensure that it is + not driven into an undefined state by array elements that do not + fulfill the promise, and that it does continue to fulfill its API + contract in this case as well. + + As with all formats that are used for data interchange, an attacker + may have control over the shape of the data delivered as input to the + application, which therefore needs to validate that shape before it + makes it the basis of its further processing. One unique aspect that + typed arrays add to this is that an attacker might substitute a + Uint8ClampedArray for where the application expects a Uint8Array, or + vice versa, potentially leading to very different (and unexpected) + processing semantics of the in-memory data structures constructed. + Applications that could be affected by this will therefore need to be + careful about making this distinction in their input validation. + + +8. References + +8.1. Normative References + + [C] International Organization for Standardization, + "Information technology - Programming languages - C", ISO/ + IEC 9899:2018, Fourth Edition, June 2018. + + [CPlusPlus] + International Organization for Standardization, + "Programming languages - C++", ISO/IEC 14882:2017, Fifth + Edition, December 2017. + + [ECMA-ES10] + ECMA International, "ECMAScript 2019 Language + Specification", Standard ECMA-262 10th Edition, June 2019, + <https://www.ecma-international.org/ecma-262/10.0/ + index.html>. + + [IEEE754] IEEE, "IEEE Standard for Floating-Point Arithmetic", + IEEE 754-2019, DOI 10.1109/IEEESTD.2019.8766229, + <https://ieeexplore.ieee.org/document/8766229>. + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, + DOI 10.17487/RFC2119, March 1997, + <https://www.rfc-editor.org/info/rfc2119>. + + [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object + Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, + October 2013, <https://www.rfc-editor.org/info/rfc7049>. + + [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC + 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, + May 2017, <https://www.rfc-editor.org/info/rfc8174>. + + [RFC8610] Birkholz, H., Vigano, C., and C. Bormann, "Concise Data + Definition Language (CDDL): A Notational Convention to + Express Concise Binary Object Representation (CBOR) and + JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, + June 2019, <https://www.rfc-editor.org/info/rfc8610>. + +8.2. Informative References + + [ArrayBuffer] + Mozilla Developer Network, "JavaScript typed arrays", June + 2010, <https://developer.mozilla.org/en- + US/docs/Web/JavaScript/Typed_arrays>. + + [RowColMajor] + Wikipedia, "Row- and column-major order", September 2019, + <https://en.wikipedia.org/w/index.php?title=Row- + _and_column-major_order&oldid=917905325>. + + [TypedArray] + Vukicevic, V. and K. Russell, "Typed Array Specification", + February 2011, + <https://web.archive.org/web/20110207024413/ + http://www.khronos.org/registry/typedarray/specs/latest/>. + +Acknowledgements + + Jim Schaad provided helpful comments and reminded us that column- + major order still is in use. Jeffrey Yaskin helped improve the + definition of homogeneous arrays. IANA helped correct an error in a + previous draft version. Francesca Palombini acted as Shepherd, and + Alexey Melnikov as responsible Area Director. Elwyn Davies as Gen- + ART reviewer and IESG members Martin Vigoureux, Adam Roach, Roman + Danyliw, and Benjamin Kaduk helped in finding further improvements to + the text; thanks also to the other reviewers. + +Contributors + + The initial draft version of this specification was written by + Johnathan Roatch <roatch@gmail.com>. Many thanks for getting this + ball rolling. + + Glenn Engel suggested the tags for multi-dimensional arrays and + homogeneous arrays. + +Author's Address + + Carsten Bormann (editor) + Universität Bremen TZI + Postfach 330440 + D-28359 Bremen + Germany + + Phone: +49-421-218-63921 + Email: cabo@tzi.org |