Punycode

From Academic Kids

Unicode
Encodings
Bi-directional text
BOM
Han unification
Unicode and HTML
Unicode and Email

Punycode, defined in RFC 3492, is a self-proclaimed "Bootstring encoding" of Unicode strings into the limited character set supported by the Domain Name System. The encoding is used as part of IDNA, which is a system enabling the use of internationalized domain names in all languages supported by Unicode, where the burden of translation lies entirely with the user application (for example, the Web browser).

The encoding is applied separately to each component of a domain name which is not representable solely within the ASCII character set, and a reserved prefix 'xn--' is added to the translated Punycode string. For example, bücher becomes bcher-kva in Punycode, and therefore the domain name bücher.ch would be represented as xn--bcher-kva.ch in IDNA.

Special characters are removed from the string, while at the end a code is added for the combination of the characters (here ü, character 252) and their positions in the string. In this case we have "kva", corresponding to "10 21 0" (a=A->0, z=Z->25, 0->26, 9->35), representing the number 10 + 35 × 21 = 745. Since "bücher" has six characters, there are six possibilities for the position of the "ü". 745 = 6 × 124 + 1, representing that the "ü" is put after the first normal character, and 124 represents character 128 + 124 = 252.

A "number system with variable base" is used to allow variable-length codes without separate delimiters. The codes for multiple special characters are put in the order of their unicode-value, not in the order in which they occur in the string. The increments of these values are used, which tend to be smaller than the values themselves.

Compare an ASCII 'punycoded' URL http://xn--tdali-d8a8w.lv/ (http://xn--tdali-d8a8w.lv/) (working) and its full Unicode counterpart that does include Latvian characters with appropriate diacritics: http://tūdaliņ.lv (http://t%C5%ABdali%C5%86.lv) (not working because this page is not in Unicode; instead, its character set is ISO-8859-1, which cannot correctly render URLs containing internationalized domain names).

Google is able to search within the 'punycoded' sites; the query string to enter is, for example site:tūdaliņ.lv (http://www.google.com/search?hl=en&lr=&safe=off&c2coff=1&q=site%3A.t%C5%ABdali%C5%86.lv&btnG=Search).

Punycode is designed to work across all script systems, and to be self-optimising by attempting to adapt to the character set ranges within the string as it operates. It is optimised for the case where the string is composed of zero or more ASCII characters and in addition characters from only one other script system, but will cope with any arbitrary Unicode string. Note that for DNS use, the domain name string is assumed to have been normalised using Nameprep and (for top-level domains) filtered against an officially registered language table before being Punycoded, and that the DNS protocol sets limits on the acceptable lengths of the output Punycode string.

Spoofing concerns

Because Punycode allows websites to use full Unicode names, IDNA could leave their users open to phishing attacks. IDNA makes it possible to create a spoofed web site that looks exactly like another, including domain name and security certificate, but in fact is controlled by someone attempting to steal private information. See Internationalizing Domain Names in Applications for more.

Rather than preventing users from accessing internationalized websites, Firefox displays "punycode" by default so that spoofed websites are easier to spot. Mozilla does not see this as a permanent fix, and it is unlikely to placate some critics who are urging browser manufacturers to stick by IDN. Safari, as of Security Update 2005-003, does the same for a configurable list of scripts including the three most likely to mislead: Greek, Cyrillic, and Cherokee.

In order to address concerns about punycode usability, Opera utilises a white-list for domain registrars that regulate against possible exploits. Hence, a whitelisted TLD will display the Unicode name, whereas untrusted domains will display the Punycode name with the xn-- prefix. Characters from Latin-1 are allowed for all TLDs, even those not on the whitelist, as within Latin-1 there is little chance for exploit using misleading characters.

External links

fr:Punycode ja:Punycode pl:Punycode sv:Punycode

Navigation

Academic Kids Menu

  • Art and Cultures
    • Art (http://www.academickids.com/encyclopedia/index.php/Art)
    • Architecture (http://www.academickids.com/encyclopedia/index.php/Architecture)
    • Cultures (http://www.academickids.com/encyclopedia/index.php/Cultures)
    • Music (http://www.academickids.com/encyclopedia/index.php/Music)
    • Musical Instruments (http://academickids.com/encyclopedia/index.php/List_of_musical_instruments)
  • Biographies (http://www.academickids.com/encyclopedia/index.php/Biographies)
  • Clipart (http://www.academickids.com/encyclopedia/index.php/Clipart)
  • Geography (http://www.academickids.com/encyclopedia/index.php/Geography)
    • Countries of the World (http://www.academickids.com/encyclopedia/index.php/Countries)
    • Maps (http://www.academickids.com/encyclopedia/index.php/Maps)
    • Flags (http://www.academickids.com/encyclopedia/index.php/Flags)
    • Continents (http://www.academickids.com/encyclopedia/index.php/Continents)
  • History (http://www.academickids.com/encyclopedia/index.php/History)
    • Ancient Civilizations (http://www.academickids.com/encyclopedia/index.php/Ancient_Civilizations)
    • Industrial Revolution (http://www.academickids.com/encyclopedia/index.php/Industrial_Revolution)
    • Middle Ages (http://www.academickids.com/encyclopedia/index.php/Middle_Ages)
    • Prehistory (http://www.academickids.com/encyclopedia/index.php/Prehistory)
    • Renaissance (http://www.academickids.com/encyclopedia/index.php/Renaissance)
    • Timelines (http://www.academickids.com/encyclopedia/index.php/Timelines)
    • United States (http://www.academickids.com/encyclopedia/index.php/United_States)
    • Wars (http://www.academickids.com/encyclopedia/index.php/Wars)
    • World History (http://www.academickids.com/encyclopedia/index.php/History_of_the_world)
  • Human Body (http://www.academickids.com/encyclopedia/index.php/Human_Body)
  • Mathematics (http://www.academickids.com/encyclopedia/index.php/Mathematics)
  • Reference (http://www.academickids.com/encyclopedia/index.php/Reference)
  • Science (http://www.academickids.com/encyclopedia/index.php/Science)
    • Animals (http://www.academickids.com/encyclopedia/index.php/Animals)
    • Aviation (http://www.academickids.com/encyclopedia/index.php/Aviation)
    • Dinosaurs (http://www.academickids.com/encyclopedia/index.php/Dinosaurs)
    • Earth (http://www.academickids.com/encyclopedia/index.php/Earth)
    • Inventions (http://www.academickids.com/encyclopedia/index.php/Inventions)
    • Physical Science (http://www.academickids.com/encyclopedia/index.php/Physical_Science)
    • Plants (http://www.academickids.com/encyclopedia/index.php/Plants)
    • Scientists (http://www.academickids.com/encyclopedia/index.php/Scientists)
  • Social Studies (http://www.academickids.com/encyclopedia/index.php/Social_Studies)
    • Anthropology (http://www.academickids.com/encyclopedia/index.php/Anthropology)
    • Economics (http://www.academickids.com/encyclopedia/index.php/Economics)
    • Government (http://www.academickids.com/encyclopedia/index.php/Government)
    • Religion (http://www.academickids.com/encyclopedia/index.php/Religion)
    • Holidays (http://www.academickids.com/encyclopedia/index.php/Holidays)
  • Space and Astronomy
    • Solar System (http://www.academickids.com/encyclopedia/index.php/Solar_System)
    • Planets (http://www.academickids.com/encyclopedia/index.php/Planets)
  • Sports (http://www.academickids.com/encyclopedia/index.php/Sports)
  • Timelines (http://www.academickids.com/encyclopedia/index.php/Timelines)
  • Weather (http://www.academickids.com/encyclopedia/index.php/Weather)
  • US States (http://www.academickids.com/encyclopedia/index.php/US_States)

Information

  • Home Page (http://academickids.com/encyclopedia/index.php)
  • Contact Us (http://www.academickids.com/encyclopedia/index.php/Contactus)

  • Clip Art (http://classroomclipart.com)
Toolbox
Personal tools