diff options
Diffstat (limited to 'doc/rfc/rfc9285.txt')
-rw-r--r-- | doc/rfc/rfc9285.txt | 383 |
1 files changed, 383 insertions, 0 deletions
diff --git a/doc/rfc/rfc9285.txt b/doc/rfc/rfc9285.txt new file mode 100644 index 0000000..88aa0cd --- /dev/null +++ b/doc/rfc/rfc9285.txt @@ -0,0 +1,383 @@ + + + + +Internet Engineering Task Force (IETF) P. Fältström +Request for Comments: 9285 Netnod +Category: Informational F. Ljunggren +ISSN: 2070-1721 Kirei + D.W. van Gulik + Webweaving + August 2022 + + + The Base45 Data Encoding + +Abstract + + This document describes the Base45 encoding scheme, which is built + upon the Base64, Base32, and Base16 encoding schemes. + +Status of This Memo + + This document is not an Internet Standards Track specification; it is + published for informational purposes. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Not all documents + approved by the IESG are candidates for any level of Internet + Standard; see Section 2 of RFC 7841. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + https://www.rfc-editor.org/info/rfc9285. + +Copyright Notice + + Copyright (c) 2022 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Revised BSD License text as described in Section 4.e of the + Trust Legal Provisions and are provided without warranty as described + in the Revised BSD License. + +Table of Contents + + 1. Introduction + 2. Conventions Used in This Document + 3. Interpretation of Encoded Data + 4. The Base45 Encoding + 4.1. When to Use and Not Use Base45 + 4.2. The Alphabet Used in Base45 + 4.3. Encoding Examples + 4.4. Decoding Example + 5. IANA Considerations + 6. Security Considerations + 7. Normative References + Acknowledgements + Authors' Addresses + +1. Introduction + + A QR code is used to encode text as a graphical image. Depending on + the characters used in the text, various encoding options for a QR + code exist, e.g., Numeric, Alphanumeric, and Byte mode. Even in Byte + mode, a typical QR code reader tries to interpret a byte sequence as + text encoded in UTF-8 or ISO/IEC 8859-1. Thus, QR codes cannot be + used to encode arbitrary binary data directly. Such data has to be + converted into an appropriate text before that text could be encoded + as a QR code. Compared to already established Base64, Base32, and + Base16 encoding schemes that are described in [RFC4648], the Base45 + scheme described in this document offers a more compact QR code + encoding. + + One important difference from those others and Base45 is the key + table and that the padding with '=' is not required. + +2. Conventions Used in This Document + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and + "OPTIONAL" in this document are to be interpreted as described in + BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all + capitals, as shown here. + +3. Interpretation of Encoded Data + + Encoded data is to be interpreted as described in [RFC4648] with the + exception that a different alphabet is selected. + +4. The Base45 Encoding + + QR codes have a limited ability to store binary data. In practice, + binary data have to be encoded in characters according to one of the + modes already defined in the standard for QR codes. The easiest mode + to use in called Alphanumeric mode (see Section 7.3.4 and Table 2 of + [ISO18004]. Unfortunately Alphanumeric mode uses 45 different + characters which implies neither Base32 nor Base64 are very effective + encodings. + + A 45-character subset of US-ASCII is used; the 45 characters usable + in a QR code in Alphanumeric mode (see Section 7.3.4 and Table 2 of + [ISO18004]). Base45 encodes 2 bytes in 3 characters, compared to + Base64, which encodes 3 bytes in 4 characters. + + For encoding, two bytes [a, b] MUST be interpreted as a number n in + base 256, i.e. as an unsigned integer over 16 bits so that the number + n = (a * 256) + b. + + This number n is converted to base 45 [c, d, e] so that n = c + (d * + 45) + (e * 45 * 45). Note the order of c, d and e which are chosen + so that the left-most [c] is the least significant. + + The values c, d, and e are then looked up in Table 1 to produce a + three character string. The process is reversed when decoding. + + For encoding a single byte [a], it MUST be interpreted as a base 256 + number, i.e. as an unsigned integer over 8 bits. That integer MUST + be converted to base 45 [c d] so that a = c + (45 * d). The values c + and d are then looked up in Table 1 to produce a two-character + string. + + A byte string [a b c d ... x y z] with arbitrary content and + arbitrary length MUST be encoded as follows: From left to right pairs + of bytes MUST be encoded as described above. If the number of bytes + is even, then the encoded form is a string with a length that is + evenly divisible by 3. If the number of bytes is odd, then the last + (rightmost) byte MUST be encoded on two characters as described + above. + + For decoding a Base45 encoded string the inverse operations are + performed. + +4.1. When to Use and Not Use Base45 + + If binary data is to be stored in a QR code, the suggested mechanism + is to use the Alphanumeric mode that uses 11 bits for 2 characters as + defined in Section 7.3.4 of [ISO18004]. The Extended Channel + Interpretation (ECI) mode indicator for this encoding is 0010. + + On the other hand if the data is to be sent via some other transport, + a transport encoding suitable for that transport should be used + instead of Base45. For example, it is not recommended to first + encode data in Base45 and then encode the resulting string in Base64 + if the data is to be sent via email. Instead, the Base45 encoding + should be removed, and the data itself should be encoded in Base64. + +4.2. The Alphabet Used in Base45 + + The Alphanumeric mode is defined to use 45 characters as specified in + this alphabet. + + +=====+==========+=====+==========+=====+==========+=====+==========+ + |Value| Encoding |Value| Encoding |Value| Encoding |Value| Encoding | + +=====+==========+=====+==========+=====+==========+=====+==========+ + | 00| 0 | 12| C | 24| O | 36| Space | + +-----+----------+-----+----------+-----+----------+-----+----------+ + | 01| 1 | 13| D | 25| P | 37| $ | + +-----+----------+-----+----------+-----+----------+-----+----------+ + | 02| 2 | 14| E | 26| Q | 38| % | + +-----+----------+-----+----------+-----+----------+-----+----------+ + | 03| 3 | 15| F | 27| R | 39| * | + +-----+----------+-----+----------+-----+----------+-----+----------+ + | 04| 4 | 16| G | 28| S | 40| + | + +-----+----------+-----+----------+-----+----------+-----+----------+ + | 05| 5 | 17| H | 29| T | 41| - | + +-----+----------+-----+----------+-----+----------+-----+----------+ + | 06| 6 | 18| I | 30| U | 42| . | + +-----+----------+-----+----------+-----+----------+-----+----------+ + | 07| 7 | 19| J | 31| V | 43| / | + +-----+----------+-----+----------+-----+----------+-----+----------+ + | 08| 8 | 20| K | 32| W | 44| : | + +-----+----------+-----+----------+-----+----------+-----+----------+ + | 09| 9 | 21| L | 33| X | | | + +-----+----------+-----+----------+-----+----------+-----+----------+ + | 10| A | 22| M | 34| Y | | | + +-----+----------+-----+----------+-----+----------+-----+----------+ + | 11| B | 23| N | 35| Z | | | + +-----+----------+-----+----------+-----+----------+-----+----------+ + + Table 1: The Base45 Alphabet + +4.3. Encoding Examples + + It should be noted that although the examples are all text, Base45 is + an encoding for binary data where each octet can have any value + 0-255. + + Encoding example 1: + + The string "AB" is the byte sequence [[65 66]]. If we look at all + 16 bits, we get 65 * 256 + 66 = 16706. 16706 equals 11 + (11 * + 45) + (8 * 45 * 45), so the sequence in base 45 is [11 11 8]. + Referring to Table 1, we get the encoded string "BB8". + + +-----------+------------------+ + | AB | Initial string | + +-----------+------------------+ + | [[65 66]] | Decimal value | + +-----------+------------------+ + | [16706] | Value in base 16 | + +-----------+------------------+ + | [11 11 8] | Value in base 45 | + +-----------+------------------+ + | BB8 | Encoded string | + +-----------+------------------+ + + Table 2: Example 1 in Detail + + Encoding example 2: + + The string "Hello!!" as ASCII is the byte sequence [[72 101] [108 + 108] [111 33] [33]]. If we look at this 16 bits at a time, we get + [18533 27756 28449 33]. Note the 33 for the last byte. When + looking at the values in base 45, we get [[38 6 9] [36 31 13] [9 2 + 14] [33 0]], where the last byte is represented by two values. + The resulting string "%69 VD92EX0" is created by looking up these + values in Table 1. It should be noted it includes a space. + + +---------------------------------------+------------------+ + | Hello!! | Initial string | + +---------------------------------------+------------------+ + | [[72 101] [108 108] [111 33] [33]] | Decimal value | + +---------------------------------------+------------------+ + | [18533 27756 28449 33] | Value in base 16 | + +---------------------------------------+------------------+ + | [[38 6 9] [36 31 13] [9 2 14] [33 0]] | Value in base 45 | + +---------------------------------------+------------------+ + | %69 VD92EX0 | Encoded string | + +---------------------------------------+------------------+ + + Table 3: Example 2 in Detail + + Encoding example 3: + + The string "base-45" as ASCII is the byte sequence [[98 97] [115 + 101] [45 52] [53]]. If we look at this two bytes at a time, we + get [25185 29541 11572 53]. Note the 53 for the last byte. When + looking at the values in base 45, we get [[30 19 12] [21 26 14] [7 + 32 5] [8 1]] where the last byte is represented by two values. + Referring to Table 1, we get the encoded string "UJCLQE7W581". + + +----------------------------------------+------------------+ + | base-45 | Initial string | + +----------------------------------------+------------------+ + | [[98 97] [115 101] [45 52] [53]] | Decimal value | + +----------------------------------------+------------------+ + | [25185 29541 11572 53] | Value in base 16 | + +----------------------------------------+------------------+ + | [[30 19 12] [21 26 14] [7 32 5] [8 1]] | Value in base 45 | + +----------------------------------------+------------------+ + | UJCLQE7W581 | Encoded string | + +----------------------------------------+------------------+ + + Table 4: Example 3 in Detail + +4.4. Decoding Example + + Decoding example 1: + + The string "QED8WEX0" represents, when looked up in Table 1, the + values [26 14 13 8 32 14 33 0]. We arrange the numbers in chunks + of three, except for the last one which can be two numbers, and + get [[26 14 13] [8 32 14] [33 0]]. In base 45, we get [26981 + 29798 33] where the bytes are [[105 101] [116 102] [33]]. If we + look at the ASCII values, we get the string "ietf!". + + +-------------------------------+------------------------+ + | QED8WEX0 | Initial string | + +-------------------------------+------------------------+ + | [26 14 13 8 32 14 33 0] | Looked up values | + +-------------------------------+------------------------+ + | [[26 14 13] [8 32 14] [33 0]] | Groups of three | + +-------------------------------+------------------------+ + | [26981 29798 33] | Interpreted as base 45 | + +-------------------------------+------------------------+ + | [[105 101] [116 102] [33]] | Values in base 8 | + +-------------------------------+------------------------+ + | ietf! | Decoded string | + +-------------------------------+------------------------+ + + Table 5: Example 4 in Detail + +5. IANA Considerations + + This document has no IANA actions. + +6. Security Considerations + + When implementing encoding and decoding it is important to be very + careful so that buffer overflow or similar issues do not occur. This + of course includes the calculations in base 45 and lookup in the + table of characters (Table 1). A decoder must also be robust + regarding input, including proper handling of any octet value 0-255, + including the NUL character (ASCII 0). + + It should be noted that Base64 and some other encodings pad the + string so that the encoding starts with an aligned number of + characters while Base45 specifically avoids padding. Because of + this, special care has to be taken when an odd number of octets is to + be encoded. Similarly, care must be taken if the number of + characters to decode are not evenly divisible by 3. + + Base encodings use a specific, reduced alphabet to encode binary + data. Non-alphabet characters could exist within base-encoded data, + caused by data corruption or by design. Non-alphabet characters may + be exploited as a "covert channel", where non-protocol data can be + sent for nefarious purposes. Non-alphabet characters might also be + sent in order to exploit implementation errors leading to, for + example, buffer overflow attacks. + + Implementations MUST reject any input that is not a valid encoding. + For example, it MUST reject the input (encoded data) if it contains + characters outside the base alphabet (in Table 1) when interpreting + base-encoded data. + + Even though a Base45-encoded string contains only characters from the + alphabet in Table 1, cases like the following have to be considered: + The string "FGW" represents 65535 (FFFF in base 16), which is a valid + encoding of 16 bits. A slightly different encoded string of the same + length, "GGW", would represent 65536 (10000 in base 16), which is + represented by more than 16 bits. Implementations MUST also reject + the encoded data if it contains a triplet of characters that, when + decoded, results in an unsigned integer that is greater than 65535 + (FFFF in base 16). + + It should be noted that the resulting string after encoding to Base45 + might include non-URL-safe characters so if the URL including the + Base45 encoded data has to be URL-safe, one has to use percent- + encoding. + +7. Normative References + + [ISO18004] ISO/IEC, "Information technology - Automatic + identification and data capture techniques - QR Code bar + code symbology specification", ISO/IEC 18004:2015, + February 2015, <https://www.iso.org/standard/62021.html>. + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, + DOI 10.17487/RFC2119, March 1997, + <https://www.rfc-editor.org/info/rfc2119>. + + [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data + Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, + <https://www.rfc-editor.org/info/rfc4648>. + + [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC + 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, + May 2017, <https://www.rfc-editor.org/info/rfc8174>. + +Acknowledgements + + The authors thank Mark Adler, Anders Ahl, Alan Barrett, Sam Spens + Clason, Alfred Fiedler, Tomas Harreveld, Erik Hellman, Joakim + Jardenberg, Michael Joost, Erik Kline, Christian Landgren, Anders + Lowinger, Mans Nilsson, Jakob Schlyter, Peter Teufl, and Gaby + Whitehead for the feedback. Also, everyone who has been working with + Base64 over a long period of years and has proven the implementations + are stable. + +Authors' Addresses + + Patrik Fältström + Netnod + Email: paf@netnod.se + + + Fredrik Ljunggren + Kirei + Email: fredrik@kirei.se + + + Dirk-Willem van Gulik + Webweaving + Email: dirkx@webweaving.org |