From 4bfd864f10b68b71482b35c818559068ef8d5797 Mon Sep 17 00:00:00 2001 From: Thomas Voss Date: Wed, 27 Nov 2024 20:54:24 +0100 Subject: doc: Add RFC documents --- doc/rfc/rfc8369.txt | 619 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 619 insertions(+) create mode 100644 doc/rfc/rfc8369.txt (limited to 'doc/rfc/rfc8369.txt') diff --git a/doc/rfc/rfc8369.txt b/doc/rfc/rfc8369.txt new file mode 100644 index 0000000..104de75 --- /dev/null +++ b/doc/rfc/rfc8369.txt @@ -0,0 +1,619 @@ + + + + + + +Independent Submission H. Kaplan +Request for Comments: 8369 128 Technology +Category: Informational 1 April 2018 +ISSN: 2070-1721 + + + Internationalizing IPv6 Using 128-Bit Unicode + +Abstract + + It is clear that Unicode will eventually exhaust its supply of code + points, and more will be needed. Assuming ISO and the Unicode + Consortium follow the practices of the IETF, the next Unicode code + point size will be 128 bits. This document describes how this future + 128-bit Unicode can be leveraged to improve IPv6 adoption and finally + bring internationalization support to IPv6. + +Status of This Memo + + This document is not an Internet Standards Track specification; it is + published for informational purposes. + + This is a contribution to the RFC Series, independently of any other + RFC stream. The RFC Editor has chosen to publish this document at + its discretion and makes no statement about its value for + implementation or deployment. Documents approved for publication by + the RFC Editor are not candidates for any level of Internet Standard; + see Section 2 of RFC 7841. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + https://www.rfc-editor.org/info/rfc8369. + +Copyright Notice + + Copyright (c) 2018 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. + + + + + + + +Kaplan Informational [Page 1] + +RFC 8369 Unicode IPv6 Addressing 1 April 2018 + + +Table of Contents + + 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 + 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 + 1.2. Definitions . . . . . . . . . . . . . . . . . . . . . . . 4 + 2. The Need for 128-Bit Code Points . . . . . . . . . . . . . . 4 + 3. Unicode IPv6 Addresses . . . . . . . . . . . . . . . . . . . 6 + 3.1. Reserved Addresses . . . . . . . . . . . . . . . . . . . 6 + 3.2. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 7 + 3.3. IPv6 Routing . . . . . . . . . . . . . . . . . . . . . . 7 + 4. Using Unicode IPv6 Addresses . . . . . . . . . . . . . . . . 8 + 4.1. Uniform Resource Identifiers . . . . . . . . . . . . . . 8 + 4.2. Address Allocation and Resolution . . . . . . . . . . . . 8 + 5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 + 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 + 7. Security Considerations . . . . . . . . . . . . . . . . . . . 9 + 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 + 8.1. Normative References . . . . . . . . . . . . . . . . . . 10 + 8.2. Informative References . . . . . . . . . . . . . . . . . 10 + Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 11 + Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 11 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Kaplan Informational [Page 2] + +RFC 8369 Unicode IPv6 Addressing 1 April 2018 + + +1. Introduction + + Unicode [Unicode] is currently limited to 1,114,112 code points, + encoded in various encoding formats (e.g., UTF-8, UTF-16, UTF-32). + At the time of this document's publication, 136,755 code points have + been allocated, with more already in the approval process. Every + year, more writing scripts, symbols, and emojis are added, while none + are removed. After consulting expert mathematicians, we have + determined that the world will run out of code points someday in the + future. + + While it might appear that the current rate of code point allocation + gives us plenty of time to deal with the exhaustion problem, the + Internet's history has shown that popular number spaces do not fill + up linearly, but rather exponentially. And once the size of a + particular number space becomes entrenched, it takes decades to + migrate to a larger one. Therefore, the code point number space must + be increased as soon as possible. + + The details for expanding the Unicode code point space are not + covered in this document. Such details need to be worked out between + the IETF, ISO, the Unicode Consortium, and various gods. We assume, + however, that the code point space will need to grow dramatically, + and there will continue to be a need for a fixed-length encoding + scheme similar to UTF-32. Naturally, the next size increment should + go from UTF-32 to UTF-128, and thus the rest of this document follows + this assumption. + + This new 128-bit Unicode code point space can be leveraged by the + IETF to address one of the lingering issues with IPv6: there's not + much left to standardize. With the changes described in this + document, the IETF will be kept busy for decades to come. It also + enables new features and market opportunities, to help the global + economy. This in turn will increase tax revenues for governments, + which eventually may lead to increased funds for combating global + warming. Therefore, the ultimate goal of this document is to reduce + global warming. + +1.1. Requirements Language + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and + "OPTIONAL" in this document are to be interpreted as described in + BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all + capitals, as shown here. All other words SHOULD be interpreted as + described by the Oxford English Dictionary OED [OED], which MAY be + considered almost as authoritative for word definitions as the IETF. + + + + +Kaplan Informational [Page 3] + +RFC 8369 Unicode IPv6 Addressing 1 April 2018 + + +1.2. Definitions + + UTF-128: A fixed-length encoding for 128-bit Unicode. It is + implemented as an array of bytes in the same manner as legacy + IPv6 addresses to avoid endianness issues. + + Short-Name Tag: A short descriptive name for a Unicode character + code point, surrounded by colons (:). For example ":garfield:" + represents the Unicode code point for the Garfield cat imoji. + + Emoji: Pictographic symbol encoded in Unicode, used to express a + general item, concept, or emotion. + + Imoji: Pictographic symbol encoded in Unicode, used to represent an + individual, specific thing: a specific human, a favorite pet, a + famous animal, etc. + + Amoji: Pictographic symbol encoded in Unicode, used to represent an + individual of an alien species. + + Umoji: Pictographic symbol encoded in Unicode, used to represent + unknown things not covered by the other mojis. + + Omoji: Pictographic symbol encoded in Unicode, used to represent + obfuscated identities, used as addresses for the purpose of + privacy. + +2. The Need for 128-Bit Code Points + + The exponentially increasing demand for Unicode character code points + might not be obvious at first glance. While it is true that the + number of languages and their writing scripts do not grow quickly + over time, one type of "character" will: emojis. Unicode has barely + begun providing code points for all of the various emojis currently + in use, and it is likely that more emojis will be created in the + future. For example, there are still missing emoji symbols for most + types of food and drink, the flags of each town and city on Earth, + all human sporting and leisure activities including all local and + national sports teams and players, and every plant and animal species + and gender. + + Furthermore, it has become common for some applications to allow + their users to create custom emojis, whereby the user can provide the + graphic to display for a new "character". For example, a user might + set their chat application to display a graphic of Carlos Ramirez's + popular "Trollface" meme [TROLL], using the short-name tag + ':trollface:' in their chat application. All other users of the same + chat app will be able to see and use the same custom trollface emoji. + + + +Kaplan Informational [Page 4] + +RFC 8369 Unicode IPv6 Addressing 1 April 2018 + + + However, since there is no Unicode code point for :trollface:, the + chat app cannot export the trollface emoji to other chat apps or + protocols, such as Internet Relay Chat (IRC) or the Extensible + Messaging and Presence Protocol (XMPP). This represents a clear + interoperability issue. + + In the future, it might also become desirable to assign each human a + Unicode code point to represent them, similar to names, with the + glyph being a picture of their face or a custom graphic. These + personal code points are not truly "emojis" in the classical sense of + being generic concepts, so we've decided to give them a new name to + avoid confusion: imoji. Unlike human names, code points for imojis + will be unique per human, for all space and time. Favorite pets and + famous animals can also be assigned imojis. + + Lastly, if we ever encounter sentient species from other planets, + they too will need Unicode code points for their writing scripts and + emojis; and they will each need unique amojis (imojis for aliens), + for whatever form their individual identity might take. Section 4 of + RFC 8136 [RFC8136] clearly supports such a scenario, with the new UFO + IPv6 option. + + Based on the above obvious use cases, it is clear that the current ~1 + million code points are nowhere near enough. Increasing to 64 bits + might be sufficient for now, but since this will be a painful + transition process no matter the size, we believe jumping to 128 bits + is the appropriate choice. + + Note: The current limit of ~1 million code points is a formal limit + due to what UTF-16 can encode today. Increasing the limit will + either require deprecating UTF-16 or paying a hefty overhead penalty + to encode 128 bits across many pairs of surrogate code points. Since + the ultimate goal of this document is to reduce global warming, the + challenge of choosing between deprecating UTF-16 or paying the + overhead price is a trivial dilemma to solve by comparison. + + + + + + + + + + + + + + + + +Kaplan Informational [Page 5] + +RFC 8369 Unicode IPv6 Addressing 1 April 2018 + + +3. Unicode IPv6 Addresses + + Assuming the new Unicode code point space is 128 bits -- excluding + some reserved bits for backwards compatibility and future expansion + -- it seems only natural to use Unicode code points for IPv6 + addresses, and vice versa. This leads to some exciting changes, such + as: + + o IPv6 addresses no longer need to be typed as hex values -- + instead, the glyph for the script character, symbol, emoji, or + imoji representing that address can be input by the user, and it + will be displayed by the application as the graphic itself. From + the user's perspective, this will also be more compact than the + representation described in RFC 1924 [RFC1924]. + + o Network monitoring and troubleshooting tools can now display + pretty glyphs in place of ugly IPv6 addresses, leading to less + stress on the eyes of network administrators. + + o For cases where graphical glyphs cannot be used, such as IETF + documents, we can deprecate the legacy textual notation of IPv6 + addresses of the style '2001:db8:85a3::8a2e:370:7334' to the + simpler Unicode textual notation + 'U+20010DB885A3000000008A2E03707334'. Using the short-name tag is + also possible, such as ':v6address-1:'. + + Due to the nature of having IPv6 addresses be Unicode code points, + RFC 8135 [RFC8135] is made obsolete by this document. It was found + to be too complex to implement anyway. + +3.1. Reserved Addresses + + Some address code points will be inappropriate for IPv6 addressing, + such as formatting characters and control codes. Such code points + MUST NOT be used for IPv6 addresses. + + We do, however, still need to reserve some code points for private + network use. Since no sentient life has been found on Mars, the code + points that would have been allocated for Martian imojis are hereby + allocated for this private use. These addresses are thus called + "Martians", also known as "Bogons" due to them being bogus. + + Note: Should life be found on Mars in the future, new code points + will be allocated for them. To avoid confusion, they will be called + "Barsoom Indigenous Glyph Off-world Network" addresses, or "Bigons" + (pronounced "bye-gons"). We're certain the Martians will let Bogons + be bygones, and Bigons be Bigons. + + + + +Kaplan Informational [Page 6] + +RFC 8369 Unicode IPv6 Addressing 1 April 2018 + + +3.2. Multicast + + In both IPv4 and IPv6, multicast addresses have been relegated to + predefined IP address ranges, limiting how many multicast groups + could be used simultaneously. Given the rise of broadcasting-style + social media platforms, and the market demand for individuals to be + watched/followed by numerous random strangers constantly, it seems + clear that we need to be able to multicast everything, all the time. + + To address this need, the high-order bit of the 128-bit code point + space SHALL be reserved to indicate multicast. All valid code points + (i.e., IPv6 addresses) will thus have multicast counterparts. For + example, the code point allocated for :cat: is U+1F408. The + multicast group U+8000000000000000000000000001F408 is thus for + content about cats. Note that this is for general cat content -- + other code points are allocated for specific cat content, such as joy + cat, grinning cat, pouty cat, etc. For an individual cat like + Garfield, setting the high-order bit to the code point allocated for + :garfield: will indicate that it is multicast content about Garfield. + + Source-specific multicast also plays a role; for example, joining the + :garfield: multicast group and restricting it to a source of + :garfield: results in only receiving content about Garfield, from + Garfield. + +3.3. IPv6 Routing + + There should be little impact on routing using code-point-based IPv6 + addresses. There might be some exponential growth in routing and + forwarding tables due to difficulties in aggregating code points; + hopefully, this will be offset by increases in processor and memory + capacity. Of course this will also drive the need to frequently + upgrade networking hardware, resulting in a boost to the global + economy, and thus a reduction in global warming. + + One improvement to routing that MAY be considered is for scenic + routing as defined by RFC 7511 [RFC7511]. With emojis and imojis + being available for addressing, we can now specify which exact type + of scenery to visit along the way, or even which exact avian carrier + [RFC6214] to ride with. Note that avian carriers as described in RFC + 1149 [RFC1149] are not supported, since they only support IPv4. + + + + + + + + + + +Kaplan Informational [Page 7] + +RFC 8369 Unicode IPv6 Addressing 1 April 2018 + + +4. Using Unicode IPv6 Addresses + +4.1. Uniform Resource Identifiers + + Uniform Resource Identifiers (URIs) and Uniform Resource Locators + (URLs) already support Unicode through Internationalized Resource + Identifiers (IRIs), but these are merely a means to use multiple + Unicode characters to indicate a resource. With 128-bit Unicode, the + number space is large enough to identify each resource with a single + Unicode character. Why waste space and time typing out multiple + characters, when you can just use one? + + For URLs, this new model might only mean using a single Unicode + character for the hostname portion -- for example, a corporate logo + in place of the legacy corporate domain name. Another alternative is + to allocate a code point for the entire host and path, possibly even + including the scheme. These types of decisions can be made in future + IETF Working Groups. + + The interesting aspect of this change for URIs/URLs is that no + address lookup needs to be performed. The single 128-bit Unicode for + the URL *is* the IPv6 address. An additional step is only needed if + the user inputs a private Unicode character or short-name tag that + needs to be converted to a publicly allocated one. This would + require Network Address Translation (NAT) from the private code point + or short-name tag to a public Unicode code point. This can be done + locally, thus finally bringing NATs into the last part of the + Internet in which they are not currently deployed: the user's + application. + +4.2. Address Allocation and Resolution + + It is obvious that once a single 128-bit Unicode character is used + for addresses and URIs, using domain names will quickly become + obsolete. The subsequent collapse of the domain name industry + presents a threat to the world economy, which MUST be addressed. + + One solution to this danger is to establish a Unicode registry model + and an accompanying Code Point Unicode Resolution System (CPURS, + pronounced "keepers"). CPURS would replace DNS and provide an + architecture and resolution mechanism to resolve Unicode code points + to their registered glyphs and short-name tags, and vice versa. The + new Unicode registries and registrars would thus replace the legacy + domain name counterparts. This would lead to a new gold rush for + registering Unicode code points for corporate logos and product + icons, and thus usher in an era of economic prosperity, which would + eventually reduce global warming. + + + + +Kaplan Informational [Page 8] + +RFC 8369 Unicode IPv6 Addressing 1 April 2018 + + + Once Unicode registries and CPURS are in place, IPv6 addresses would + be allocated by registering code points through that system; they + would no longer be registered by IANA and RIRs. This is not a major + concern, however, because any tax revenue loss will be more than + offset by Unicode registries allocating code points. Furthermore, in + order to make CPURS possible, the actual graphic files for the glyphs + need to be standardized and created in numerous formats and sizes, + with various intellectual property rules. This will provide more + work for graphic artists and lawyers, further increasing tax revenue. + + The astute reader might ask why we need CPURS if Unicode translation + is performed locally on hosts today. The answer is volume: it is + unlikely that host applications can keep up with the rate of new + Unicode code points being allocated for emojis, imojis, and umojis. + While application and operating system updates have been occurring at + an ever-increasing rate, and will soon reach the same rate as human + births, it is doubtful that it will ever reach the rate of sentient + extraterrestrial births. Therefore, we need a system that can scale + to reach such volume before we make first contact; otherwise, the + diplomatic failure to quickly provide the aliens with amojis of their + own may lead to armed conflict. An armed conflict with other + sentient beings capable of reaching Earth might increase global + warming, defeating this document's ultimate purpose. + +5. Summary + + There is still much to be decided on, most of which is frankly rather + boring. It is clear, however, that 128-bit Unicode code points will + be needed eventually, and IPv6 addressing MUST be migrated to it. + Thus, the time to act is now! + +6. IANA Considerations + + This document has no IANA actions. + +7. Security Considerations + + The main security concern with using 128-bit Unicode for IPv6 + addressing is the need for privacy, in terms of anonymity. If an + IPv6 packet is sent with an imoji or amoji address, then man-in-the- + middle devices in the network will know the specific human or alien + that sent or received the packet. Using such information might lead + to heated discussions, thereby increasing global warming. + + To address this concern, an IPv6 address MAY be obfuscated by using + an omoji. An omoji is simply the original Unicode code point but + with the least-significant bit set; all other types of 128-bit + Unicode code points MUST have the least-significant bit cleared. The + + + +Kaplan Informational [Page 9] + +RFC 8369 Unicode IPv6 Addressing 1 April 2018 + + + graphical representation of an omoji is the same as its unobfsucated + moji counterpart, except that it is covered over by a solid black + block. + + By setting the least-significant bit of the source or destination and + thus turning it into an omoji, the IPv6 address is obfuscated and the + true identity cannot be determined, while IPv6 routers can still + route the packet appropriately. Note that this only provides a bit + of privacy, but every bit helps. + +8. References + +8.1. Normative References + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, + DOI 10.17487/RFC2119, March 1997, + . + + [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC + 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, + May 2017, . + +8.2. Informative References + + [OED] Oxford University Press, "Oxford English Dictionary", + . + + [RFC1149] Waitzman, D., "Standard for the transmission of IP + datagrams on avian carriers", RFC 1149, + DOI 10.17487/RFC1149, April 1990, + . + + [RFC1924] Elz, R., "A Compact Representation of IPv6 Addresses", + RFC 1924, DOI 10.17487/RFC1924, April 1996, + . + + [RFC6214] Carpenter, B. and R. Hinden, "Adaptation of RFC 1149 for + IPv6", RFC 6214, DOI 10.17487/RFC6214, April 2011, + . + + [RFC7511] Wilhelm, M., "Scenic Routing for IPv6", RFC 7511, + DOI 10.17487/RFC7511, April 2015, + . + + [RFC8135] Danielson, M. and M. Nilsson, "Complex Addressing in + IPv6", RFC 8135, DOI 10.17487/RFC8135, April 2017, + . + + + +Kaplan Informational [Page 10] + +RFC 8369 Unicode IPv6 Addressing 1 April 2018 + + + [RFC8136] Carpenter, B. and R. Hinden, "Additional Transition + Functionality for IPv6", RFC 8136, DOI 10.17487/RFC8136, + April 2017, . + + [TROLL] The Meme Wikia, "Trollface", + . + + [Unicode] The Unicode Consortium, "Unicode", . + +Acknowledgements + + The authors wish to thank the following people for providing the + inspiration for this work: Cal Henderson, Carlos Ramirez, Graham + Linehan, Agnetha Faltskog, Bjorn Ulvaeus, Benny Andersson, and + Anni-Frid Lyngstad. + +Author's Address + + Hadriel Kaplan + 128 Technology + Burlington, MA + United States of America + + Email: hadriel@128technology.com + + + + + + + + + + + + + + + + + + + + + + + + + + + +Kaplan Informational [Page 11] + -- cgit v1.2.3