summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc8369.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc8369.txt')
-rw-r--r--doc/rfc/rfc8369.txt619
1 files changed, 619 insertions, 0 deletions
diff --git a/doc/rfc/rfc8369.txt b/doc/rfc/rfc8369.txt
new file mode 100644
index 0000000..104de75
--- /dev/null
+++ b/doc/rfc/rfc8369.txt
@@ -0,0 +1,619 @@
+
+
+
+
+
+
+Independent Submission H. Kaplan
+Request for Comments: 8369 128 Technology
+Category: Informational 1 April 2018
+ISSN: 2070-1721
+
+
+ Internationalizing IPv6 Using 128-Bit Unicode
+
+Abstract
+
+ It is clear that Unicode will eventually exhaust its supply of code
+ points, and more will be needed. Assuming ISO and the Unicode
+ Consortium follow the practices of the IETF, the next Unicode code
+ point size will be 128 bits. This document describes how this future
+ 128-bit Unicode can be leveraged to improve IPv6 adoption and finally
+ bring internationalization support to IPv6.
+
+Status of This Memo
+
+ This document is not an Internet Standards Track specification; it is
+ published for informational purposes.
+
+ This is a contribution to the RFC Series, independently of any other
+ RFC stream. The RFC Editor has chosen to publish this document at
+ its discretion and makes no statement about its value for
+ implementation or deployment. Documents approved for publication by
+ the RFC Editor are not candidates for any level of Internet Standard;
+ see Section 2 of RFC 7841.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ https://www.rfc-editor.org/info/rfc8369.
+
+Copyright Notice
+
+ Copyright (c) 2018 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (https://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document.
+
+
+
+
+
+
+
+Kaplan Informational [Page 1]
+
+RFC 8369 Unicode IPv6 Addressing 1 April 2018
+
+
+Table of Contents
+
+ 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
+ 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3
+ 1.2. Definitions . . . . . . . . . . . . . . . . . . . . . . . 4
+ 2. The Need for 128-Bit Code Points . . . . . . . . . . . . . . 4
+ 3. Unicode IPv6 Addresses . . . . . . . . . . . . . . . . . . . 6
+ 3.1. Reserved Addresses . . . . . . . . . . . . . . . . . . . 6
+ 3.2. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 7
+ 3.3. IPv6 Routing . . . . . . . . . . . . . . . . . . . . . . 7
+ 4. Using Unicode IPv6 Addresses . . . . . . . . . . . . . . . . 8
+ 4.1. Uniform Resource Identifiers . . . . . . . . . . . . . . 8
+ 4.2. Address Allocation and Resolution . . . . . . . . . . . . 8
+ 5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
+ 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9
+ 7. Security Considerations . . . . . . . . . . . . . . . . . . . 9
+ 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 10
+ 8.1. Normative References . . . . . . . . . . . . . . . . . . 10
+ 8.2. Informative References . . . . . . . . . . . . . . . . . 10
+ Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 11
+ Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 11
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Kaplan Informational [Page 2]
+
+RFC 8369 Unicode IPv6 Addressing 1 April 2018
+
+
+1. Introduction
+
+ Unicode [Unicode] is currently limited to 1,114,112 code points,
+ encoded in various encoding formats (e.g., UTF-8, UTF-16, UTF-32).
+ At the time of this document's publication, 136,755 code points have
+ been allocated, with more already in the approval process. Every
+ year, more writing scripts, symbols, and emojis are added, while none
+ are removed. After consulting expert mathematicians, we have
+ determined that the world will run out of code points someday in the
+ future.
+
+ While it might appear that the current rate of code point allocation
+ gives us plenty of time to deal with the exhaustion problem, the
+ Internet's history has shown that popular number spaces do not fill
+ up linearly, but rather exponentially. And once the size of a
+ particular number space becomes entrenched, it takes decades to
+ migrate to a larger one. Therefore, the code point number space must
+ be increased as soon as possible.
+
+ The details for expanding the Unicode code point space are not
+ covered in this document. Such details need to be worked out between
+ the IETF, ISO, the Unicode Consortium, and various gods. We assume,
+ however, that the code point space will need to grow dramatically,
+ and there will continue to be a need for a fixed-length encoding
+ scheme similar to UTF-32. Naturally, the next size increment should
+ go from UTF-32 to UTF-128, and thus the rest of this document follows
+ this assumption.
+
+ This new 128-bit Unicode code point space can be leveraged by the
+ IETF to address one of the lingering issues with IPv6: there's not
+ much left to standardize. With the changes described in this
+ document, the IETF will be kept busy for decades to come. It also
+ enables new features and market opportunities, to help the global
+ economy. This in turn will increase tax revenues for governments,
+ which eventually may lead to increased funds for combating global
+ warming. Therefore, the ultimate goal of this document is to reduce
+ global warming.
+
+1.1. Requirements Language
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
+ "OPTIONAL" in this document are to be interpreted as described in
+ BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
+ capitals, as shown here. All other words SHOULD be interpreted as
+ described by the Oxford English Dictionary OED [OED], which MAY be
+ considered almost as authoritative for word definitions as the IETF.
+
+
+
+
+Kaplan Informational [Page 3]
+
+RFC 8369 Unicode IPv6 Addressing 1 April 2018
+
+
+1.2. Definitions
+
+ UTF-128: A fixed-length encoding for 128-bit Unicode. It is
+ implemented as an array of bytes in the same manner as legacy
+ IPv6 addresses to avoid endianness issues.
+
+ Short-Name Tag: A short descriptive name for a Unicode character
+ code point, surrounded by colons (:). For example ":garfield:"
+ represents the Unicode code point for the Garfield cat imoji.
+
+ Emoji: Pictographic symbol encoded in Unicode, used to express a
+ general item, concept, or emotion.
+
+ Imoji: Pictographic symbol encoded in Unicode, used to represent an
+ individual, specific thing: a specific human, a favorite pet, a
+ famous animal, etc.
+
+ Amoji: Pictographic symbol encoded in Unicode, used to represent an
+ individual of an alien species.
+
+ Umoji: Pictographic symbol encoded in Unicode, used to represent
+ unknown things not covered by the other mojis.
+
+ Omoji: Pictographic symbol encoded in Unicode, used to represent
+ obfuscated identities, used as addresses for the purpose of
+ privacy.
+
+2. The Need for 128-Bit Code Points
+
+ The exponentially increasing demand for Unicode character code points
+ might not be obvious at first glance. While it is true that the
+ number of languages and their writing scripts do not grow quickly
+ over time, one type of "character" will: emojis. Unicode has barely
+ begun providing code points for all of the various emojis currently
+ in use, and it is likely that more emojis will be created in the
+ future. For example, there are still missing emoji symbols for most
+ types of food and drink, the flags of each town and city on Earth,
+ all human sporting and leisure activities including all local and
+ national sports teams and players, and every plant and animal species
+ and gender.
+
+ Furthermore, it has become common for some applications to allow
+ their users to create custom emojis, whereby the user can provide the
+ graphic to display for a new "character". For example, a user might
+ set their chat application to display a graphic of Carlos Ramirez's
+ popular "Trollface" meme [TROLL], using the short-name tag
+ ':trollface:' in their chat application. All other users of the same
+ chat app will be able to see and use the same custom trollface emoji.
+
+
+
+Kaplan Informational [Page 4]
+
+RFC 8369 Unicode IPv6 Addressing 1 April 2018
+
+
+ However, since there is no Unicode code point for :trollface:, the
+ chat app cannot export the trollface emoji to other chat apps or
+ protocols, such as Internet Relay Chat (IRC) or the Extensible
+ Messaging and Presence Protocol (XMPP). This represents a clear
+ interoperability issue.
+
+ In the future, it might also become desirable to assign each human a
+ Unicode code point to represent them, similar to names, with the
+ glyph being a picture of their face or a custom graphic. These
+ personal code points are not truly "emojis" in the classical sense of
+ being generic concepts, so we've decided to give them a new name to
+ avoid confusion: imoji. Unlike human names, code points for imojis
+ will be unique per human, for all space and time. Favorite pets and
+ famous animals can also be assigned imojis.
+
+ Lastly, if we ever encounter sentient species from other planets,
+ they too will need Unicode code points for their writing scripts and
+ emojis; and they will each need unique amojis (imojis for aliens),
+ for whatever form their individual identity might take. Section 4 of
+ RFC 8136 [RFC8136] clearly supports such a scenario, with the new UFO
+ IPv6 option.
+
+ Based on the above obvious use cases, it is clear that the current ~1
+ million code points are nowhere near enough. Increasing to 64 bits
+ might be sufficient for now, but since this will be a painful
+ transition process no matter the size, we believe jumping to 128 bits
+ is the appropriate choice.
+
+ Note: The current limit of ~1 million code points is a formal limit
+ due to what UTF-16 can encode today. Increasing the limit will
+ either require deprecating UTF-16 or paying a hefty overhead penalty
+ to encode 128 bits across many pairs of surrogate code points. Since
+ the ultimate goal of this document is to reduce global warming, the
+ challenge of choosing between deprecating UTF-16 or paying the
+ overhead price is a trivial dilemma to solve by comparison.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Kaplan Informational [Page 5]
+
+RFC 8369 Unicode IPv6 Addressing 1 April 2018
+
+
+3. Unicode IPv6 Addresses
+
+ Assuming the new Unicode code point space is 128 bits -- excluding
+ some reserved bits for backwards compatibility and future expansion
+ -- it seems only natural to use Unicode code points for IPv6
+ addresses, and vice versa. This leads to some exciting changes, such
+ as:
+
+ o IPv6 addresses no longer need to be typed as hex values --
+ instead, the glyph for the script character, symbol, emoji, or
+ imoji representing that address can be input by the user, and it
+ will be displayed by the application as the graphic itself. From
+ the user's perspective, this will also be more compact than the
+ representation described in RFC 1924 [RFC1924].
+
+ o Network monitoring and troubleshooting tools can now display
+ pretty glyphs in place of ugly IPv6 addresses, leading to less
+ stress on the eyes of network administrators.
+
+ o For cases where graphical glyphs cannot be used, such as IETF
+ documents, we can deprecate the legacy textual notation of IPv6
+ addresses of the style '2001:db8:85a3::8a2e:370:7334' to the
+ simpler Unicode textual notation
+ 'U+20010DB885A3000000008A2E03707334'. Using the short-name tag is
+ also possible, such as ':v6address-1:'.
+
+ Due to the nature of having IPv6 addresses be Unicode code points,
+ RFC 8135 [RFC8135] is made obsolete by this document. It was found
+ to be too complex to implement anyway.
+
+3.1. Reserved Addresses
+
+ Some address code points will be inappropriate for IPv6 addressing,
+ such as formatting characters and control codes. Such code points
+ MUST NOT be used for IPv6 addresses.
+
+ We do, however, still need to reserve some code points for private
+ network use. Since no sentient life has been found on Mars, the code
+ points that would have been allocated for Martian imojis are hereby
+ allocated for this private use. These addresses are thus called
+ "Martians", also known as "Bogons" due to them being bogus.
+
+ Note: Should life be found on Mars in the future, new code points
+ will be allocated for them. To avoid confusion, they will be called
+ "Barsoom Indigenous Glyph Off-world Network" addresses, or "Bigons"
+ (pronounced "bye-gons"). We're certain the Martians will let Bogons
+ be bygones, and Bigons be Bigons.
+
+
+
+
+Kaplan Informational [Page 6]
+
+RFC 8369 Unicode IPv6 Addressing 1 April 2018
+
+
+3.2. Multicast
+
+ In both IPv4 and IPv6, multicast addresses have been relegated to
+ predefined IP address ranges, limiting how many multicast groups
+ could be used simultaneously. Given the rise of broadcasting-style
+ social media platforms, and the market demand for individuals to be
+ watched/followed by numerous random strangers constantly, it seems
+ clear that we need to be able to multicast everything, all the time.
+
+ To address this need, the high-order bit of the 128-bit code point
+ space SHALL be reserved to indicate multicast. All valid code points
+ (i.e., IPv6 addresses) will thus have multicast counterparts. For
+ example, the code point allocated for :cat: is U+1F408. The
+ multicast group U+8000000000000000000000000001F408 is thus for
+ content about cats. Note that this is for general cat content --
+ other code points are allocated for specific cat content, such as joy
+ cat, grinning cat, pouty cat, etc. For an individual cat like
+ Garfield, setting the high-order bit to the code point allocated for
+ :garfield: will indicate that it is multicast content about Garfield.
+
+ Source-specific multicast also plays a role; for example, joining the
+ :garfield: multicast group and restricting it to a source of
+ :garfield: results in only receiving content about Garfield, from
+ Garfield.
+
+3.3. IPv6 Routing
+
+ There should be little impact on routing using code-point-based IPv6
+ addresses. There might be some exponential growth in routing and
+ forwarding tables due to difficulties in aggregating code points;
+ hopefully, this will be offset by increases in processor and memory
+ capacity. Of course this will also drive the need to frequently
+ upgrade networking hardware, resulting in a boost to the global
+ economy, and thus a reduction in global warming.
+
+ One improvement to routing that MAY be considered is for scenic
+ routing as defined by RFC 7511 [RFC7511]. With emojis and imojis
+ being available for addressing, we can now specify which exact type
+ of scenery to visit along the way, or even which exact avian carrier
+ [RFC6214] to ride with. Note that avian carriers as described in RFC
+ 1149 [RFC1149] are not supported, since they only support IPv4.
+
+
+
+
+
+
+
+
+
+
+Kaplan Informational [Page 7]
+
+RFC 8369 Unicode IPv6 Addressing 1 April 2018
+
+
+4. Using Unicode IPv6 Addresses
+
+4.1. Uniform Resource Identifiers
+
+ Uniform Resource Identifiers (URIs) and Uniform Resource Locators
+ (URLs) already support Unicode through Internationalized Resource
+ Identifiers (IRIs), but these are merely a means to use multiple
+ Unicode characters to indicate a resource. With 128-bit Unicode, the
+ number space is large enough to identify each resource with a single
+ Unicode character. Why waste space and time typing out multiple
+ characters, when you can just use one?
+
+ For URLs, this new model might only mean using a single Unicode
+ character for the hostname portion -- for example, a corporate logo
+ in place of the legacy corporate domain name. Another alternative is
+ to allocate a code point for the entire host and path, possibly even
+ including the scheme. These types of decisions can be made in future
+ IETF Working Groups.
+
+ The interesting aspect of this change for URIs/URLs is that no
+ address lookup needs to be performed. The single 128-bit Unicode for
+ the URL *is* the IPv6 address. An additional step is only needed if
+ the user inputs a private Unicode character or short-name tag that
+ needs to be converted to a publicly allocated one. This would
+ require Network Address Translation (NAT) from the private code point
+ or short-name tag to a public Unicode code point. This can be done
+ locally, thus finally bringing NATs into the last part of the
+ Internet in which they are not currently deployed: the user's
+ application.
+
+4.2. Address Allocation and Resolution
+
+ It is obvious that once a single 128-bit Unicode character is used
+ for addresses and URIs, using domain names will quickly become
+ obsolete. The subsequent collapse of the domain name industry
+ presents a threat to the world economy, which MUST be addressed.
+
+ One solution to this danger is to establish a Unicode registry model
+ and an accompanying Code Point Unicode Resolution System (CPURS,
+ pronounced "keepers"). CPURS would replace DNS and provide an
+ architecture and resolution mechanism to resolve Unicode code points
+ to their registered glyphs and short-name tags, and vice versa. The
+ new Unicode registries and registrars would thus replace the legacy
+ domain name counterparts. This would lead to a new gold rush for
+ registering Unicode code points for corporate logos and product
+ icons, and thus usher in an era of economic prosperity, which would
+ eventually reduce global warming.
+
+
+
+
+Kaplan Informational [Page 8]
+
+RFC 8369 Unicode IPv6 Addressing 1 April 2018
+
+
+ Once Unicode registries and CPURS are in place, IPv6 addresses would
+ be allocated by registering code points through that system; they
+ would no longer be registered by IANA and RIRs. This is not a major
+ concern, however, because any tax revenue loss will be more than
+ offset by Unicode registries allocating code points. Furthermore, in
+ order to make CPURS possible, the actual graphic files for the glyphs
+ need to be standardized and created in numerous formats and sizes,
+ with various intellectual property rules. This will provide more
+ work for graphic artists and lawyers, further increasing tax revenue.
+
+ The astute reader might ask why we need CPURS if Unicode translation
+ is performed locally on hosts today. The answer is volume: it is
+ unlikely that host applications can keep up with the rate of new
+ Unicode code points being allocated for emojis, imojis, and umojis.
+ While application and operating system updates have been occurring at
+ an ever-increasing rate, and will soon reach the same rate as human
+ births, it is doubtful that it will ever reach the rate of sentient
+ extraterrestrial births. Therefore, we need a system that can scale
+ to reach such volume before we make first contact; otherwise, the
+ diplomatic failure to quickly provide the aliens with amojis of their
+ own may lead to armed conflict. An armed conflict with other
+ sentient beings capable of reaching Earth might increase global
+ warming, defeating this document's ultimate purpose.
+
+5. Summary
+
+ There is still much to be decided on, most of which is frankly rather
+ boring. It is clear, however, that 128-bit Unicode code points will
+ be needed eventually, and IPv6 addressing MUST be migrated to it.
+ Thus, the time to act is now!
+
+6. IANA Considerations
+
+ This document has no IANA actions.
+
+7. Security Considerations
+
+ The main security concern with using 128-bit Unicode for IPv6
+ addressing is the need for privacy, in terms of anonymity. If an
+ IPv6 packet is sent with an imoji or amoji address, then man-in-the-
+ middle devices in the network will know the specific human or alien
+ that sent or received the packet. Using such information might lead
+ to heated discussions, thereby increasing global warming.
+
+ To address this concern, an IPv6 address MAY be obfuscated by using
+ an omoji. An omoji is simply the original Unicode code point but
+ with the least-significant bit set; all other types of 128-bit
+ Unicode code points MUST have the least-significant bit cleared. The
+
+
+
+Kaplan Informational [Page 9]
+
+RFC 8369 Unicode IPv6 Addressing 1 April 2018
+
+
+ graphical representation of an omoji is the same as its unobfsucated
+ moji counterpart, except that it is covered over by a solid black
+ block.
+
+ By setting the least-significant bit of the source or destination and
+ thus turning it into an omoji, the IPv6 address is obfuscated and the
+ true identity cannot be determined, while IPv6 routers can still
+ route the packet appropriately. Note that this only provides a bit
+ of privacy, but every bit helps.
+
+8. References
+
+8.1. Normative References
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119,
+ DOI 10.17487/RFC2119, March 1997,
+ <https://www.rfc-editor.org/info/rfc2119>.
+
+ [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
+ 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
+ May 2017, <https://www.rfc-editor.org/info/rfc8174>.
+
+8.2. Informative References
+
+ [OED] Oxford University Press, "Oxford English Dictionary",
+ <http://www.oed.com>.
+
+ [RFC1149] Waitzman, D., "Standard for the transmission of IP
+ datagrams on avian carriers", RFC 1149,
+ DOI 10.17487/RFC1149, April 1990,
+ <https://www.rfc-editor.org/info/rfc1149>.
+
+ [RFC1924] Elz, R., "A Compact Representation of IPv6 Addresses",
+ RFC 1924, DOI 10.17487/RFC1924, April 1996,
+ <https://www.rfc-editor.org/info/rfc1924>.
+
+ [RFC6214] Carpenter, B. and R. Hinden, "Adaptation of RFC 1149 for
+ IPv6", RFC 6214, DOI 10.17487/RFC6214, April 2011,
+ <https://www.rfc-editor.org/info/rfc6214>.
+
+ [RFC7511] Wilhelm, M., "Scenic Routing for IPv6", RFC 7511,
+ DOI 10.17487/RFC7511, April 2015,
+ <https://www.rfc-editor.org/info/rfc7511>.
+
+ [RFC8135] Danielson, M. and M. Nilsson, "Complex Addressing in
+ IPv6", RFC 8135, DOI 10.17487/RFC8135, April 2017,
+ <https://www.rfc-editor.org/info/rfc8135>.
+
+
+
+Kaplan Informational [Page 10]
+
+RFC 8369 Unicode IPv6 Addressing 1 April 2018
+
+
+ [RFC8136] Carpenter, B. and R. Hinden, "Additional Transition
+ Functionality for IPv6", RFC 8136, DOI 10.17487/RFC8136,
+ April 2017, <https://www.rfc-editor.org/info/rfc8136>.
+
+ [TROLL] The Meme Wikia, "Trollface",
+ <http://meme.wikia.com/wiki/Rule_63?oldid=23602>.
+
+ [Unicode] The Unicode Consortium, "Unicode", <http://unicode.org>.
+
+Acknowledgements
+
+ The authors wish to thank the following people for providing the
+ inspiration for this work: Cal Henderson, Carlos Ramirez, Graham
+ Linehan, Agnetha Faltskog, Bjorn Ulvaeus, Benny Andersson, and
+ Anni-Frid Lyngstad.
+
+Author's Address
+
+ Hadriel Kaplan
+ 128 Technology
+ Burlington, MA
+ United States of America
+
+ Email: hadriel@128technology.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Kaplan Informational [Page 11]
+