From 4bfd864f10b68b71482b35c818559068ef8d5797 Mon Sep 17 00:00:00 2001 From: Thomas Voss Date: Wed, 27 Nov 2024 20:54:24 +0100 Subject: doc: Add RFC documents --- doc/rfc/rfc5564.txt | 619 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 619 insertions(+) create mode 100644 doc/rfc/rfc5564.txt (limited to 'doc/rfc/rfc5564.txt') diff --git a/doc/rfc/rfc5564.txt b/doc/rfc/rfc5564.txt new file mode 100644 index 0000000..d9b413a --- /dev/null +++ b/doc/rfc/rfc5564.txt @@ -0,0 +1,619 @@ + + + + + + +Independent Submission A. El-Sherbiny +Request for Comments: 5564 M. Farah +Category: Informational UN-ESCWA +ISSN: 2070-1721 I. Oueichek + Syrian Telecom Establishment + A. Al-Zoman + SaudiNIC, CITC + February 2010 + + + Linguistic Guidelines for the Use of + the Arabic Language in Internet Domains + +Abstract + + This document constitutes technical specifications for the use of + Arabic in Internet domain names and provides linguistic guidelines + for Arabic domain names. It addresses Arabic-specific linguistic + issues pertaining to the use of Arabic language in domain names. + +Status of This Memo + + This document is not an Internet Standards Track specification; it is + published for informational purposes. + + This is a contribution to the RFC Series, independently of any other + RFC stream. The RFC Editor has chosen to publish this document at + its discretion and makes no statement about its value for + implementation or deployment. Documents approved for publication by + the RFC Editor are not a candidate for any level of Internet + Standard; see Section 2 of RFC 5741. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + http://www.rfc-editor.org/info/rfc5564. + + + + + + + + + + + + + + + + +El-Sherbiny, et al. Informational [Page 1] + +RFC 5564 Arabic Character Guidelines February 2010 + + +Copyright Notice + + Copyright (c) 2010 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (http://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. + + This document may not be modified, and derivative works of it may not + be created, except to format it for publication as an RFC or to + translate it into languages other than English. + +Table of Contents + + 1. Introduction ....................................................2 + 2. Arabic Language-Specific Issues .................................3 + 2.1. Linguistic Issues ..........................................4 + 2.1.1. Diacritics (Tashkeel) and Shadda ....................4 + 2.1.2. Kasheeda or Tatweel (Horizontal Character + Size Extension) .....................................5 + 2.1.3. Character Folding ...................................5 + 2.2. Supported Character Set ....................................6 + 2.3. Arabic Linguistic Issues Affected by Technical + Constraints ................................................8 + 2.3.1. Numerals ............................................8 + 2.3.2. The Space Character .................................8 + 3. Summary and Conclusion ..........................................8 + 4. Security Considerations .........................................9 + 5. Acknowledgments .................................................9 + 6. References ......................................................9 + 6.1. Normative References .......................................9 + 6.2. Informative References .....................................9 + +1. Introduction + + The Internet Engineering Task Force (IETF) issued in March 2003 a set + of RFCs for Internationalized Domain Names (IDN) ([1], [2], and [3]), + which were planned to become the de facto standard for all languages. + In 2007 and 2008, the following working drafts were released that + propose revisions to the IDNA protocol: + + o Internationalized Domain Names for Applications (IDNA): + Background, Explanation, and Rationale [5] + + + + +El-Sherbiny, et al. Informational [Page 2] + +RFC 5564 Arabic Character Guidelines February 2010 + + + o Internationalized Domain Names in Applications (IDNA): Protocol + [6] + + o An updated IDNA criterion for right-to-left scripts [7] + + o The Unicode code points and IDNA [8] + + These documents are known collectively as "IDNA2008". + + This document constitutes a technical specification for the + implementation of the IDN standards in the case of the Arabic + language. It will allow the use of standard language tables to write + domain names in Arabic characters. Therefore, it should be + considered as a logical extension to the IDN standards. It thus + presents guidelines for the proper use of Arabic characters with the + IDN standards in an Arabic language context. + + This document reflects the recommendations of the Arab Working Group + on Arabic Domain Names (AWG-ADN), established by the League of Arab + States (LAS), based on standardisation efforts of the United Nations + Economic and Social Commission for Western Asia (UN-ESCWA) and on + that group's document, "Guidelines for an Arabic Internet Domain + Name" [9]. This document is also in full harmony with recent + rigorous discussions that took place within the major language + communities that use the Arabic script in their languages. + + This document provides guidelines for the ways Arabic characters may + be used for registering Internet domain names and how linguistic- + specific issues should be handled. A few rules are recommended for + application at the protocol level. + + The key words "MUST", "REQUIRED", "SHOULD", "RECOMMENDED", and "MAY" + in this document are to be interpreted as described in RFC 2119 [4]. + + Comments on this document are solicited and should be addressed to + the working group's mailing list at ESCWA-ICTD@un.org and/or the + author(s). + +2. Arabic Language-Specific Issues + + The main objective of the creation of Arabic domain names is to have + a vehicle to increase Internet use amongst all strata of the Arabic- + speaking communities. + + Furthermore, a non-user-friendly domain name would further add to the + ambiguity and the eccentricity of the Internet to the Arabic-speaking + communities, thus contributing negatively to the spread of the + + + + +El-Sherbiny, et al. Informational [Page 3] + +RFC 5564 Arabic Character Guidelines February 2010 + + + Internet and leading to further isolation of these communities at the + global level. + + Hence, there have been intensive efforts (especially those + spearheaded by Dr. Al-Zoman and contributed to by UN-ESCWA and its + Arabic Domain Names Task Force (ADN-TF)) to reach consensus on a + multitude of linguistic issues with the following goals: + + o To define the accepted Arabic character set to be used for writing + domain names in Arabic, which is the subject of this document. + + o To define the top-level domains of the Arabic domain name tree + structure (i.e., Arabic gTLDs and ccTLDs). This goal will be + handled in a separate document. + + The first meeting of the AWG-ADN, held in Damascus from January- + February 2005, gave special attention to the following: + + o Simplification of the domain names, whenever possible, to + facilitate the interaction of the Arabic user with the Internet. + + o Adoption of solutions that do not lead to confusion either in + reading or in writing, provided that this does not compromise the + linguistic correctness of used words. + + o Mixing Arabic and non-Arabic letters in the domain name label is + not acceptable. + +2.1. Linguistic Issues + + There are a number of linguistic issues that have been proposed with + respect to the use of the Arabic language in domain names. This + section will highlight some of them. This section is based on the + papers of Dr. Al-Zoman ([10] and [11]) and on the report of the first + meeting of AWG-ADN [12]. For details, the reader is encouraged to + review these references. + +2.1.1. Diacritics (Tashkeel) and Shadda + + Tashkeel and Shadda are accent marks placed above or below Arabic + letters to produce proper pronunciation. They are thus used to + differentiate different meanings for different words with the same + base characters. + + Neither Tashkeel nor Shadda are permitted in zone files when + registering domain names in the Arabic language, although they are + permitted in the current edition of IDNA2008. They can be supported + + + + +El-Sherbiny, et al. Informational [Page 4] + +RFC 5564 Arabic Character Guidelines February 2010 + + + or ignored, if necessary, in the user interface with local mappings + and can be stripped before IDNA processing. + + The following are their Unicode presentations: + + U+064B ARABIC FATHATAN + U+064C ARABIC DAMMATAN + U+064D ARABIC KASRATAN + U+064E ARABIC FATHA + U+064F ARABIC DAMMA + U+0650 ARABIC KASRA + U+0651 ARABIC SHADDA + U+0652 ARABIC SUKUN + +2.1.2. Kasheeda or Tatweel (Horizontal Character Size Extension) + + Kasheeda (U+0640 ARABIC TATWEEL) must not be used in Arabic domain + names and should be disallowed for Arabic language domain names. The + Kasheeda is not a letter and does not have an effect on + pronunciation. It is used to extend the horizontal length or change + the shape of the preceding letter for graphical representation + purposes in Arabic writing. Accordingly, it has no value for the + writing of domain names. The same applies to all languages using the + Arabic script. The authors recommend that it should be disallowed at + the protocol level. + +2.1.3. Character Folding + + Character folding is the process where multiple letters (that may + have some similarity with respect to their shapes) are folded into + one shape. Examples of such Arabic characters include: + + o Folding Teh Marbuta (U+0629) and Heh (U+0647) at the end of a word + + o Folding different forms of Hamzah (U+0622, U+0623, U+0625, U+0627) + + o Folding Alef Maksura (U+0649) and Yeh (U+064A) at the end of a + word + + o Folding Waw with Hamzah Above (U+0624) and Waw (U+0648) + + With respect to the Arabic language, character folding is not + acceptable because it changes the meaning of words and is against the + principle of spelling rules. Replacing a character valid for use in + domain names with another character also valid for use in domain + names, which may have a similar shape, will give a different meaning. + This will lead to only one word representing several words consisting + + + + +El-Sherbiny, et al. Informational [Page 5] + +RFC 5564 Arabic Character Guidelines February 2010 + + + of all the combinations of folded characters. Hence, the other words + will be masked by a single word [10]. + + Mis-spelling or handwriting errors do occur, leading to mixing + different characters despite the fact that this is not the case in + published and printed materials. One of the motivations of this + effort is to preserve the language, particularly with the spread of + the globalization movement. Within this context, character folding + is working against this motivation since it is going to have a + negative effect on the principle and ethics of the language. + Technology should work to preserve the language and not to destroy + it. Thus, character folding should not be allowed. The case of + digits is treated in a separate section below. + +2.2. Supported Character Set + + A domain name to be written in Arabic must be composed of a sequence + of the following UNICODE characters and the FULL STOP (u+002E) to + separate the labels. These are based on UNICODE version 5.0. The + tables below are constructed using an inclusion-based approach. + Thus, characters that are not part of these tables are prohibited. + + +---------+-------------------------------------+ + | Unicode | Character Name | + +---------+-------------------------------------+ + | 0621 | ARABIC LETTER HAMZA | + | 0622 | ARABIC LETTER ALEF WITH MADDA ABOVE | + | 0623 | ARABIC LETTER ALEF WITH HAMZA ABOVE | + | 0624 | ARABIC LETTER WAW WITH HAMZA ABOVE | + | 0625 | ARABIC LETTER ALEF WITH HAMZA BELOW | + | 0626 | ARABIC LETTER YEH WITH HAMZA ABOVE | + | 0627 | ARABIC LETTER ALEF | + | 0628 | ARABIC LETTER BEH | + | 0629 | ARABIC LETTER TEH MARBUTA | + | 062A | ARABIC LETTER TEH | + | 062B | ARABIC LETTER THEH | + | 062C | ARABIC LETTER JEEM | + | 062D | ARABIC LETTER HAH | + | 062E | ARABIC LETTER KHAH | + | 062F | ARABIC LETTER DAL | + | 0630 | ARABIC LETTER THAL | + | 0631 | ARABIC LETTER REH | + | 0632 | ARABIC LETTER ZAIN | + | 0633 | ARABIC LETTER SEEN | + | 0634 | ARABIC LETTER SHEEN | + | 0635 | ARABIC LETTER SAD | + | 0636 | ARABIC LETTER DAD | + | 0637 | ARABIC LETTER TAH | + + + +El-Sherbiny, et al. Informational [Page 6] + +RFC 5564 Arabic Character Guidelines February 2010 + + + | 0638 | ARABIC LETTER ZAH | + | 0639 | ARABIC LETTER AIN | + | 063A | ARABIC LETTER GHAIN | + | 0641 | ARABIC LETTER FEH | + | 0642 | ARABIC LETTER QAF | + | 0643 | ARABIC LETTER KAF | + | 0644 | ARABIC LETTER LAM | + | 0645 | ARABIC LETTER MEEM | + | 0646 | ARABIC LETTER NOON | + | 0647 | ARABIC LETTER HEH | + | 0648 | ARABIC LETTER WAW | + | 0649 | ARABIC LETTER ALEF MAKSURA | + | 064A | ARABIC LETTER YEH | + | 0660 | ARABIC-INDIC DIGIT ZERO | + | 0661 | ARABIC-INDIC DIGIT ONE | + | 0662 | ARABIC-INDIC DIGIT TWO | + | 0663 | ARABIC-INDIC DIGIT THREE | + | 0664 | ARABIC-INDIC DIGIT FOUR | + | 0665 | ARABIC-INDIC DIGIT FIVE | + | 0666 | ARABIC-INDIC DIGIT SIX | + | 0667 | ARABIC-INDIC DIGIT SEVEN | + | 0668 | ARABIC-INDIC DIGIT EIGHT | + | 0669 | ARABIC-INDIC DIGIT NINE | + +---------+-------------------------------------+ + + Source: Supporting the Arabic Language in Domain Names [10] + Table 1: CHARACTERS FROM UNICODE ARABIC TABLE (0600-06FF) + + +---------+-----------------+ + | Unicode | Digit Name | + +---------+-----------------+ + | 0030 | DIGIT ZERO | + | 0031 | DIGIT ONE | + | 0032 | DIGIT TWO | + | 0033 | DIGIT THREE | + | 0034 | DIGIT FOUR | + | 0035 | DIGIT FIVE | + | 0036 | DIGIT SIX | + | 0037 | DIGIT SEVEN | + | 0038 | DIGIT EIGHT | + | 0039 | DIGIT NINE | + | 002D | HYPHEN-MINUS | + +---------+-----------------+ + + Source: Supporting the Arabic Language in Domain Names [10] + Table 2: CHARACTERS FROM UNICODE BASIC LATIN TABLE (0000-007F) + + + + + +El-Sherbiny, et al. Informational [Page 7] + +RFC 5564 Arabic Character Guidelines February 2010 + + +2.3. Arabic Linguistic Issues Affected by Technical Constraints + + In this section, technical aspects of some linguistic issues are + discussed. + +2.3.1. Numerals + + In the Arab countries, there are two sets of numerical digits used: + + o Set I: (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) mostly used in the western + part of the Arab world. + + o Set II: (u+0660, u+0661, u+0662, u+0663, u+0664, u+0665, u+0666, + u+0667, u+0668, u+0669) mostly used in the eastern part of the + Arab world. + + Both sets may be supported in the user interface; however, the rule + of numeral homogeneity must be observed. The rule specifies that + digits from the Arabic-Indic set of numerals (u+0660 to u+0669) + should not be allowed to mix with ASCII digits (u+0030 to u+0039) + within the same Arabic domain name label. Thus, the appearance of a + digit from one set prevents the use of any other digit from the other + set. + +2.3.2. The Space Character + + The space character is strictly disallowed in domain names, as it is + a control character. Instead, the hyphen (Al-sharta, i.e., u+02D) is + proposed as a separator between Arabic words to avoid confusion that + can take place if the words are typed without a separator. + + It is acceptable to use the hyphen to separate between words within + the same domain name label. + +3. Summary and Conclusion + + The proposed guidelines are in full accordance with the IETF IDN + standards and take into account Arabic-language-specific issues + within a compromise between grammatical rules of the Arabic language + and ease of use of that language on the Internet. + + In summary, the guidelines specify that, in Arabic domain names: + + o Accent marks (Tashkeel and Shadda) are not permitted. + + o Character folding is not permitted. + + + + + +El-Sherbiny, et al. Informational [Page 8] + +RFC 5564 Arabic Character Guidelines February 2010 + + + o If a numeral from the Arabic-Indic or ASCII digit sets appears in + a label, numeral homogeneity is required. + + o The hyphen must be used as a word separator instead of space. + +4. Security Considerations + + No particular security considerations could be identified regarding + the use of Arabic characters in writing domain names. In particular, + any potential visual confusion between different character strings is + avoided using the guidelines proposed in this document. + +5. Acknowledgments + + ESCWA ICT Division provided support and funding for the development + of this document with the objective of reaching a standard for + comprehensive Arabic domain names. Thanks are due to SaudiNIC for + its continuous efforts in supporting the development of Arabic domain + names. + + John Klensin and Harald Alvestrand reviewed the document and provided + useful editorial and substantive support to enrich it. + +6. References + +6.1. Normative References + + [1] Faltstrom, P., Hoffman, P., and A. Costello, + "Internationalizing Domain Names in Applications (IDNA)", RFC + 3490, March 2003. + + [2] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile + for Internationalized Domain Names (IDN)", RFC 3491, March + 2003. + + [3] Costello, A., "Punycode: A Bootstring encoding of Unicode for + Internationalized Domain Names in Applications (IDNA)", RFC + 3492, March 2003. + + [4] Bradner, S., "Key words for use in RFCs to Indicate Requirement + Levels", BCP 14, RFC 2119, March 1997. + +6.2. Informative References + + [5] Klensin, J., "Internationalized Domain Names for Applications + (IDNA): Definitions, Background and Rationale", Work in + Progress, September 2008. + + + + +El-Sherbiny, et al. Informational [Page 9] + +RFC 5564 Arabic Character Guidelines February 2010 + + + [6] Klensin, J., "Internationalized Domain Names in Applications + (IDNA): Protocol", Work in Progress, September 2008. + + [7] Alvestrand, H. and C. Karp, "An updated IDNA criterion for + right-to-left scripts", Work in Progress, July 2008. + + [8] Faltstrom, P., "The Unicode Codepoints and IDNA", Work in + Progress, July 2008. + + [9] United Nations Economic and Social Commission for Western Asia + (UN-ESCWA), "Guidelines for an Arabic Domain Name System + (ADNS)", Work in Progress, November 2007. + + [10] Al-Zoman, A., "Supporting the Arabic Language in Domain Names", + October 2003, . + + [11] Al-Zoman, A., "Arabic Top-Level Domains", Paper presented in + Expert Group Meeting on Promotion of Digital Arabic Content, + the United Nations, Economic and Social Commission for Western + Asia, Beirut, June 2003. + + [12] League of Arab States, "Report of the first meeting of AWG-ADN, + Damascus", February 2005, . + + + + + + + + + + + + + + + + + + + + + + + + + + +El-Sherbiny, et al. Informational [Page 10] + +RFC 5564 Arabic Character Guidelines February 2010 + + +Authors' Addresses + + Ayman El-Sherbiny + Information and Communication Technology Division ESCWA + UN-House + P.O. Box 11-8575 + Beirut + Lebanon + + EMail: El-sherbiny@un.org + + + Mansour Farah + Information and Communication Technology Division ESCWA + UN-House + P.O. Box 11-8575 + Beirut + Lebanon + + EMail: farah14@un.org + + + Ibaa Oueichek + Syrian Telecom Establishment + Damascus + Syria + + EMail: oueichek@scs-net.org + + + Abdulaziz H. Al-Zoman, PhD + SaudiNIC, General Directorate of Internet Services + IT Sector, CITC + King Abdulaziz City for Science and Technology + PO Box 6086 + Riyadh 11442 + Saudi Arabia + + EMail: azoman@citc.gov.sa + + + + + + + + + + + + +El-Sherbiny, et al. Informational [Page 11] + -- cgit v1.2.3