Variant Chinese Domain Name
JENG-WEI LIN , Tunghai University.
JAN-MING HO , Academia Sinica.
LI-MING TSENG , National Central University.
FEIPEI LAI , National Taiwan University.
• Many efforts in past years have been made to
lower the linguistic barriers for non-native
English speakers to access the Internet.
However, the traditional Internet Domain
Name System (DNS) does not support
• Traditionally, the composition of domain
labels is restricted to ASCII letters, digits, and
hyphens (abbreviated as LDH characters).
• IDN (Internationalized Domain Name) allows
people to use their native language to name
and access Internet hosts.
• The encoding and decoding of
IDLs(Internationalized Domain Labels) which
are Unicode strings.
• Example : http://中文.tw/(財團法人台灣網
Han Character Variants
• Two Han characters are said to be variants of
each other if they have the same meaning and
are pronounced the same.
清真寺(U+6E05 U+771F U+5BFA)
清眞寺(U+6DF8 U+771E U+5BFA)
中硏院(U+4E2D U+784F U+9662)
中研院(U+4E2D U+7814 U+9662)
• For example:
五嶽(the Five Mounts, U+4E94 U+5DBD) is
equal to 五岳(U+4E94 U+5CB3),
But 嶽飛(U+5DBD U+98DB) is not equal to 岳
飛(a Chinese hero’s name, U+5CB3 U+98DB).
Language Variant Tables and IDL
• The Language Variant Table(LVT) mechanism
is used to enforce language-based character
• In a LVT, each row lists a valid character that is
permitted to be used in an IDL, its preferred
variants and other variants.
• LVTtw [TWNIC 2005] and LVTcn [CNNIC 2005]
are the LVTs for Traditional Chinese and
• When an IDL is registered, a collection of its
variant IDLs, known as an IDL package, is created
according to the zone-specified LVTs.
• All variant IDLs in the IDL package are unavailable
to other name holders.
• For example ,
IDL:臺灣大學(U+81FA U+7063 U+5927 U+5B78)
According to LVTtw and LVTcn , we can use the
enumerate the set of the variant IDLs in this
• In some cases, context-sensitive character
variants might be improper in an IDL package,
such as 颱,檯, and 籉 in this example.
• A reduced version of the IDL package with [臺
Activation of Variant IDLs
An issue of scalability
arises when there are a
large number of variant
IDLs to be activated.
VARIANT IDL RESOLUTION
L: a registered IDL,
E: the variant IDLs expression
As the examples shown previously,
• If some (context-sensitive) variants , are
improper for L, the variant IDLs in the reduced
IDL package can be enumerated by:
• As the examples shown previously,
檯, 籉,颱 and 斈 are improper.
Construction of Indexing Functions
The idea is to assign the same value to character
If X and Y are character variants of each other, we
assign H (X ) = H (Y) .
L = X1X2....Xd, we assign
H(L) = H (X1) H (X2).... H (Xd) ,
For example , H(臺)=H(台)=?
The graph of variant groups for the
construction of an indexing function Htwcn
according to LVTtwcn
Partition of Activated Variant IDLs
Htwcn will give the same variant index to all of
the variant IDLs in an IDL package registered in
a zone that adopts LVTtwcn (the union of LVTtw
However, individual domains may adopt
Variant IDL Resolution Protocol
• The registrar stores the following “VarIdx”
RR(resource record) for the IDL 臺灣大學 in
the zone ﬁles.
EVALUATION OF INDEXING FUNCTIONS
• In CNNIC, 94% of the IDLs have fewer than 8 variant IDLs,
while in TWNIC, 91% of the IDLs have fewer than 16 variant
• However, 607 IDLs in CNNIC and 861 IDLs in TWNIC have more
than 64; and 41 IDLs in CNNIC and 28 IDLs in TWNIC have
more than 512 variant IDLs.
We built a trial IDN service of a three-level
DNS hierarchy consisting of six domains.
Comparisons between the Traditional
Approach and the Proposed One
We register ﬁve IDN subdomains, which have
a total of 52 variant IDN subdomains. Under
these subdomains, we register 40 IDNs, which
have a total of 1,880 variant IDNs.