From 4bfd864f10b68b71482b35c818559068ef8d5797 Mon Sep 17 00:00:00 2001 From: Thomas Voss Date: Wed, 27 Nov 2024 20:54:24 +0100 Subject: doc: Add RFC documents --- doc/rfc/rfc3548.txt | 731 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 731 insertions(+) create mode 100644 doc/rfc/rfc3548.txt (limited to 'doc/rfc/rfc3548.txt') diff --git a/doc/rfc/rfc3548.txt b/doc/rfc/rfc3548.txt new file mode 100644 index 0000000..f50f632 --- /dev/null +++ b/doc/rfc/rfc3548.txt @@ -0,0 +1,731 @@ + + + + + + +Network Working Group S. Josefsson, Ed. +Request for Comments: 3548 July 2003 +Category: Informational + + + The Base16, Base32, and Base64 Data Encodings + +Status of this Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard of any kind. Distribution of this + memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (2003). All Rights Reserved. + +Abstract + + This document describes the commonly used base 64, base 32, and base + 16 encoding schemes. It also discusses the use of line-feeds in + encoded data, use of padding in encoded data, use of non-alphabet + characters in encoded data, and use of different encoding alphabets. + +Table of Contents + + 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 + 2. Implementation discrepancies . . . . . . . . . . . . . . . . . 2 + 2.1. Line feeds in encoded data . . . . . . . . . . . . . . . 2 + 2.2. Padding of encoded data . . . . . . . . . . . . . . . . 3 + 2.3. Interpretation of non-alphabet characters in encoded + data . . . . . . . . . . . . . . . . . . . . . . . . . . 3 + 2.4. Choosing the alphabet . . . . . . . . . . . . . . . . . 3 + 3. Base 64 Encoding . . . . . . . . . . . . . . . . . . . . . . . 4 + 4. Base 64 Encoding with URL and Filename Safe Alphabet . . . . . 6 + 5. Base 32 Encoding . . . . . . . . . . . . . . . . . . . . . . . 6 + 6. Base 16 Encoding . . . . . . . . . . . . . . . . . . . . . . . 8 + 7. Illustrations and examples . . . . . . . . . . . . . . . . . . 9 + 8. Security Considerations . . . . . . . . . . . . . . . . . . . 10 + 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 + 9.1. Normative References . . . . . . . . . . . . . . . . . . 11 + 9.2. Informative References . . . . . . . . . . . . . . . . . 11 + 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 + 11. Editor's Address . . . . . . . . . . . . . . . . . . . . . . . 12 + 12. Full Copyright Statement . . . . . . . . . . . . . . . . . . . 13 + + + + + + +Josefsson Informational [Page 1] + +RFC 3548 The Base16, Base32, and Base64 Data Encodings July 2003 + + +1. Introduction + + Base encoding of data is used in many situations to store or transfer + data in environments that, perhaps for legacy reasons, are restricted + to only US-ASCII [9] data. Base encoding can also be used in new + applications that do not have legacy restrictions, simply because it + makes it possible to manipulate objects with text editors. + + In the past, different applications have had different requirements + and thus sometimes implemented base encodings in slightly different + ways. Today, protocol specifications sometimes use base encodings in + general, and "base64" in particular, without a precise description or + reference. MIME [3] is often used as a reference for base64 without + considering the consequences for line-wrapping or non-alphabet + characters. The purpose of this specification is to establish common + alphabet and encoding considerations. This will hopefully reduce + ambiguity in other documents, leading to better interoperability. + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in RFC 2119 [1]. + +2. Implementation discrepancies + + Here we discuss the discrepancies between base encoding + implementations in the past, and where appropriate, mandate a + specific recommended behavior for the future. + +2.1. Line feeds in encoded data + + MIME [3] is often used as a reference for base 64 encoding. However, + MIME does not define "base 64" per se, but rather a "base 64 + Content-Transfer-Encoding" for use within MIME. As such, MIME + enforces a limit on line length of base 64 encoded data to 76 + characters. MIME inherits the encoding from PEM [2] stating it is + "virtually identical", however PEM uses a line length of 64 + characters. The MIME and PEM limits are both due to limits within + SMTP. + + Implementations MUST NOT not add line feeds to base encoded data + unless the specification referring to this document explicitly + directs base encoders to add line feeds after a specific number of + characters. + + + + + + + + +Josefsson Informational [Page 2] + +RFC 3548 The Base16, Base32, and Base64 Data Encodings July 2003 + + +2.2. Padding of encoded data + + In some circumstances, the use of padding ("=") in base encoded data + is not required nor used. In the general case, when assumptions on + size of transported data cannot be made, padding is required to yield + correct decoded data. + + Implementations MUST include appropriate pad characters at the end of + encoded data unless the specification referring to this document + explicitly states otherwise. + +2.3. Interpretation of non-alphabet characters in encoded data + + Base encodings use a specific, reduced, alphabet to encode binary + data. Non alphabet characters could exist within base encoded data, + caused by data corruption or by design. Non alphabet characters may + be exploited as a "covert channel", where non-protocol data can be + sent for nefarious purposes. Non alphabet characters might also be + sent in order to exploit implementation errors leading to, e.g., + buffer overflow attacks. + + Implementations MUST reject the encoding if it contains characters + outside the base alphabet when interpreting base encoded data, unless + the specification referring to this document explicitly states + otherwise. Such specifications may, as MIME does, instead state that + characters outside the base encoding alphabet should simply be + ignored when interpreting data ("be liberal in what you accept"). + Note that this means that any CRLF constitute "non alphabet + characters" and are ignored. Furthermore, such specifications may + consider the pad character, "=", as not part of the base alphabet + until the end of the string. If more than the allowed number of pad + characters are found at the end of the string, e.g., a base 64 string + terminated with "===", the excess pad characters could be ignored. + +2.4. Choosing the alphabet + + Different applications have different requirements on the characters + in the alphabet. Here are a few requirements that determine which + alphabet should be used: + + o Handled by humans. Characters "0", "O" are easily interchanged, + as well "1", "l" and "I". In the base32 alphabet below, where 0 + (zero) and 1 (one) is not present, a decoder may interpret 0 as + O, and 1 as I or L depending on case. (However, by default it + should not, see previous section.) + + + + + + +Josefsson Informational [Page 3] + +RFC 3548 The Base16, Base32, and Base64 Data Encodings July 2003 + + + o Encoded into structures that place other requirements. For base + 16 and base 32, this determines the use of upper- or lowercase + alphabets. For base 64, the non-alphanumeric characters (in + particular "/") may be problematic in file names and URLs. + + o Used as identifiers. Certain characters, notably "+" and "/" in + the base 64 alphabet, are treated as word-breaks by legacy text + search/index tools. + + There is no universally accepted alphabet that fulfills all the + requirements. In this document, we document and name some currently + used alphabets. + +3. Base 64 Encoding + + The following description of base 64 is due to [2], [3], [4] and [5]. + + The Base 64 encoding is designed to represent arbitrary sequences of + octets in a form that requires case sensitivity but need not be + humanly readable. + + A 65-character subset of US-ASCII is used, enabling 6 bits to be + represented per printable character. (The extra 65th character, "=", + is used to signify a special processing function.) + + The encoding process represents 24-bit groups of input bits as output + strings of 4 encoded characters. Proceeding from left to right, a + 24-bit input group is formed by concatenating 3 8-bit input groups. + These 24 bits are then treated as 4 concatenated 6-bit groups, each + of which is translated into a single digit in the base 64 alphabet. + + Each 6-bit group is used as an index into an array of 64 printable + characters. The character referenced by the index is placed in the + output string. + + + + + + + + + + + + + + + + + +Josefsson Informational [Page 4] + +RFC 3548 The Base16, Base32, and Base64 Data Encodings July 2003 + + + Table 1: The Base 64 Alphabet + + Value Encoding Value Encoding Value Encoding Value Encoding + 0 A 17 R 34 i 51 z + 1 B 18 S 35 j 52 0 + 2 C 19 T 36 k 53 1 + 3 D 20 U 37 l 54 2 + 4 E 21 V 38 m 55 3 + 5 F 22 W 39 n 56 4 + 6 G 23 X 40 o 57 5 + 7 H 24 Y 41 p 58 6 + 8 I 25 Z 42 q 59 7 + 9 J 26 a 43 r 60 8 + 10 K 27 b 44 s 61 9 + 11 L 28 c 45 t 62 + + 12 M 29 d 46 u 63 / + 13 N 30 e 47 v + 14 O 31 f 48 w (pad) = + 15 P 32 g 49 x + 16 Q 33 h 50 y + + Special processing is performed if fewer than 24 bits are available + at the end of the data being encoded. A full encoding quantum is + always completed at the end of a quantity. When fewer than 24 input + bits are available in an input group, zero bits are added (on the + right) to form an integral number of 6-bit groups. Padding at the + end of the data is performed using the '=' character. Since all base + 64 input is an integral number of octets, only the following cases + can arise: + + (1) the final quantum of encoding input is an integral multiple of 24 + bits; here, the final unit of encoded output will be an integral + multiple of 4 characters with no "=" padding, + + (2) the final quantum of encoding input is exactly 8 bits; here, the + final unit of encoded output will be two characters followed by two + "=" padding characters, or + + (3) the final quantum of encoding input is exactly 16 bits; here, the + final unit of encoded output will be three characters followed by one + "=" padding character. + + + + + + + + + + +Josefsson Informational [Page 5] + +RFC 3548 The Base16, Base32, and Base64 Data Encodings July 2003 + + +4. Base 64 Encoding with URL and Filename Safe Alphabet + + The Base 64 encoding with an URL and filename safe alphabet has been + used in [8]. + + An alternative alphabet has been suggested that used "~" as the 63rd + character. Since the "~" character has special meaning in some file + system environments, the encoding described in this section is + recommended instead. + + This encoding should not be regarded as the same as the "base64" + encoding, and should not be referred to as only "base64". Unless + made clear, "base64" refer to the base 64 in the previous section. + + This encoding is technically identical to the previous one, except + for the 62:nd and 63:rd alphabet character, as indicated in table 2. + + Table 2: The "URL and Filename safe" Base 64 Alphabet + + Value Encoding Value Encoding Value Encoding Value Encoding + 0 A 17 R 34 i 51 z + 1 B 18 S 35 j 52 0 + 2 C 19 T 36 k 53 1 + 3 D 20 U 37 l 54 2 + 4 E 21 V 38 m 55 3 + 5 F 22 W 39 n 56 4 + 6 G 23 X 40 o 57 5 + 7 H 24 Y 41 p 58 6 + 8 I 25 Z 42 q 59 7 + 9 J 26 a 43 r 60 8 + 10 K 27 b 44 s 61 9 + 11 L 28 c 45 t 62 - (minus) + 12 M 29 d 46 u 63 _ (understrike) + 13 N 30 e 47 v + 14 O 31 f 48 w (pad) = + 15 P 32 g 49 x + 16 Q 33 h 50 y + +5. Base 32 Encoding + + The following description of base 32 is due to [7] (with + corrections). + + The Base 32 encoding is designed to represent arbitrary sequences of + octets in a form that needs to be case insensitive but need not be + humanly readable. + + + + + +Josefsson Informational [Page 6] + +RFC 3548 The Base16, Base32, and Base64 Data Encodings July 2003 + + + A 33-character subset of US-ASCII is used, enabling 5 bits to be + represented per printable character. (The extra 33rd character, "=", + is used to signify a special processing function.) + + The encoding process represents 40-bit groups of input bits as output + strings of 8 encoded characters. Proceeding from left to right, a + 40-bit input group is formed by concatenating 5 8bit input groups. + These 40 bits are then treated as 8 concatenated 5-bit groups, each + of which is translated into a single digit in the base 32 alphabet. + When encoding a bit stream via the base 32 encoding, the bit stream + must be presumed to be ordered with the most-significant-bit first. + That is, the first bit in the stream will be the high-order bit in + the first 8bit byte, and the eighth bit will be the low-order bit in + the first 8bit byte, and so on. + + Each 5-bit group is used as an index into an array of 32 printable + characters. The character referenced by the index is placed in the + output string. These characters, identified in Table 2, below, are + selected from US-ASCII digits and uppercase letters. + + Table 3: The Base 32 Alphabet + + Value Encoding Value Encoding Value Encoding Value Encoding + 0 A 9 J 18 S 27 3 + 1 B 10 K 19 T 28 4 + 2 C 11 L 20 U 29 5 + 3 D 12 M 21 V 30 6 + 4 E 13 N 22 W 31 7 + 5 F 14 O 23 X + 6 G 15 P 24 Y (pad) = + 7 H 16 Q 25 Z + 8 I 17 R 26 2 + + + Special processing is performed if fewer than 40 bits are available + at the end of the data being encoded. A full encoding quantum is + always completed at the end of a body. When fewer than 40 input bits + are available in an input group, zero bits are added (on the right) + to form an integral number of 5-bit groups. Padding at the end of + the data is performed using the "=" character. Since all base 32 + input is an integral number of octets, only the following cases can + arise: + + (1) the final quantum of encoding input is an integral multiple of 40 + bits; here, the final unit of encoded output will be an integral + multiple of 8 characters with no "=" padding, + + + + + +Josefsson Informational [Page 7] + +RFC 3548 The Base16, Base32, and Base64 Data Encodings July 2003 + + + (2) the final quantum of encoding input is exactly 8 bits; here, the + final unit of encoded output will be two characters followed by six + "=" padding characters, + + (3) the final quantum of encoding input is exactly 16 bits; here, the + final unit of encoded output will be four characters followed by four + "=" padding characters, + + (4) the final quantum of encoding input is exactly 24 bits; here, the + final unit of encoded output will be five characters followed by + three "=" padding characters, or + + (5) the final quantum of encoding input is exactly 32 bits; here, the + final unit of encoded output will be seven characters followed by one + "=" padding character. + +6. Base 16 Encoding + + The following description is original but analogous to previous + descriptions. Essentially, Base 16 encoding is the standard standard + case insensitive hex encoding, and may be referred to as "base16" or + "hex". + + A 16-character subset of US-ASCII is used, enabling 4 bits to be + represented per printable character. + + The encoding process represents 8-bit groups (octets) of input bits + as output strings of 2 encoded characters. Proceeding from left to + right, a 8-bit input is taken from the input data. These 8 bits are + then treated as 2 concatenated 4-bit groups, each of which is + translated into a single digit in the base 16 alphabet. + + Each 4-bit group is used as an index into an array of 16 printable + characters. The character referenced by the index is placed in the + output string. + + Table 5: The Base 16 Alphabet + + Value Encoding Value Encoding Value Encoding Value Encoding + 0 0 4 4 8 8 12 C + 1 1 5 5 9 9 13 D + 2 2 6 6 10 A 14 E + 3 3 7 7 11 B 15 F + + Unlike base 32 and base 64, no special padding is necessary since a + full code word is always available. + + + + + +Josefsson Informational [Page 8] + +RFC 3548 The Base16, Base32, and Base64 Data Encodings July 2003 + + +7. Illustrations and examples + + To translate between binary and a base encoding, the input is stored + in a structure and the output is extracted. The case for base 64 is + displayed in the following figure, borrowed from [4]. + + +--first octet--+-second octet--+--third octet--+ + |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| + +-----------+---+-------+-------+---+-----------+ + |5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0| + +--1.index--+--2.index--+--3.index--+--4.index--+ + + The case for base 32 is shown in the following figure, borrowed from + [6]. Each successive character in a base-32 value represents 5 + successive bits of the underlying octet sequence. Thus, each group + of 8 characters represents a sequence of 5 octets (40 bits). + + 1 2 3 + 01234567 89012345 67890123 45678901 23456789 + +--------+--------+--------+--------+--------+ + |< 1 >< 2| >< 3 ><|.4 >< 5.|>< 6 ><.|7 >< 8 >| + +--------+--------+--------+--------+--------+ + <===> 8th character + <====> 7th character + <===> 6th character + <====> 5th character + <====> 4th character + <===> 3rd character + <====> 2nd character + <===> 1st character + + + + + + + + + + + + + + + + + + + + + +Josefsson Informational [Page 9] + +RFC 3548 The Base16, Base32, and Base64 Data Encodings July 2003 + + + The following example of Base64 data is from [4]. + + Input data: 0x14fb9c03d97e + Hex: 1 4 f b 9 c | 0 3 d 9 7 e + 8-bit: 00010100 11111011 10011100 | 00000011 11011001 + 11111110 + 6-bit: 000101 001111 101110 011100 | 000000 111101 100111 + 111110 + Decimal: 5 15 46 28 0 61 37 62 + Output: F P u c A 9 l + + + Input data: 0x14fb9c03d9 + Hex: 1 4 f b 9 c | 0 3 d 9 + 8-bit: 00010100 11111011 10011100 | 00000011 11011001 + pad with 00 + 6-bit: 000101 001111 101110 011100 | 000000 111101 100100 + Decimal: 5 15 46 28 0 61 36 + pad with = + Output: F P u c A 9 k = + + Input data: 0x14fb9c03 + Hex: 1 4 f b 9 c | 0 3 + 8-bit: 00010100 11111011 10011100 | 00000011 + pad with 0000 + 6-bit: 000101 001111 101110 011100 | 000000 110000 + Decimal: 5 15 46 28 0 48 + pad with = = + Output: F P u c A w = = + +8. Security Considerations + + When implementing Base encoding and decoding, care should be taken + not to introduce vulnerabilities to buffer overflow attacks, or other + attacks on the implementation. A decoder should not break on invalid + input including, e.g., embedded NUL characters (ASCII 0). + + If non-alphabet characters are ignored, instead of causing rejection + of the entire encoding (as recommended), a covert channel that can be + used to "leak" information is made possible. The implications of + this should be understood in applications that do not follow the + recommended practice. Similarly, when the base 16 and base 32 + alphabets are handled case insensitively, alteration of case can be + used to leak information. + + Base encoding visually hides otherwise easily recognized information, + such as passwords, but does not provide any computational + confidentiality. This has been known to cause security incidents + when, e.g., a user reports details of a network protocol exchange + + + +Josefsson Informational [Page 10] + +RFC 3548 The Base16, Base32, and Base64 Data Encodings July 2003 + + + (perhaps to illustrate some other problem) and accidentally reveals + the password because she is unaware that the base encoding does not + protect the password. + +9. References + +9.1. Normative References + + [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement + Levels", BCP 14, RFC 2119, March 1997. + +9.2. Informative References + + [2] Linn, J., "Privacy Enhancement for Internet Electronic Mail: + Part I: Message Encryption and Authentication Procedures", RFC + 1421, February 1993. + + [3] Freed, N. and N. Borenstein, "Multipurpose Internet Mail + Extensions (MIME) Part One: Format of Internet Message Bodies", + RFC 2045, November 1996. + + [4] Callas, J., Donnerhacke, L., Finney, H. and R. Thayer, "OpenPGP + Message Format", RFC 2440, November 1998. + + [5] Eastlake, D., "Domain Name System Security Extensions", RFC 2535, + March 1999. + + [6] Klyne, G. and L. Masinter, "Identifying Composite Media + Features", RFC 2938, September 2000. + + [7] Myers, J., "SASL GSSAPI mechanisms", Work in Progress. + + [8] Wilcox-O'Hearn, B., "Post to P2P-hackers mailing list", World + Wide Web http://zgp.org/pipermail/p2p-hackers/2001- + September/000315.html, September 2001. + + [9] Cerf, V., "ASCII format for Network Interchange", RFC 20, October + 1969. + +10. Acknowledgements + + Several people offered comments and suggestions, including Tony + Hansen, Gordon Mohr, John Myers, Chris Newman, and Andrew Sieber. + Text used in this document is based on earlier RFCs describing + specific uses of various base encodings. The author acknowledges the + RSA Laboratories for supporting the work that led to this document. + + + + + +Josefsson Informational [Page 11] + +RFC 3548 The Base16, Base32, and Base64 Data Encodings July 2003 + + +11. Editor's Address + + Simon Josefsson + EMail: simon@josefsson.org + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Josefsson Informational [Page 12] + +RFC 3548 The Base16, Base32, and Base64 Data Encodings July 2003 + + +12. Full Copyright Statement + + Copyright (C) The Internet Society (2003). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assignees. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + + + + + + + + + + + + + + + + + + + +Josefsson Informational [Page 13] + -- cgit v1.2.3