Transliteration standards and Unicode

Transliteration Standards and Unicode

ISO Transliteration Standards

One of the oldest areas of ISO standardization is transliteration, the mapping from one script like Cyrillic into another one like Latin (Romanization). Unlike transcription (which tries to approximate the correct pronunciation and thus differs from language to language) transliteration defines a stringent standard mapping that may contradict pronunciation practice but is unambiguous and reversible.

Examples are:

ISO 9 -- Transliteration of Cyrillic
ISO 233 -- Transliteration of Arabic
ISO 259 -- Transliteration of Hebrew
ISO 843 -- Conversion of Greek
ISO 3602 -- Romanization of Japanese Kana
ISO 7098 -- Romanization of Chinese
ISO 9984 -- Transliteration of Georgian
ISO 9985 -- Transliteration of Armenian
more to come

You can imagine more uses for transliteration standards than just library catalogues and bibliographies. Internet communication in languages with foreign alphabets on computers that are bound to ASCII usually prompts writers to invent their own transliteration standards because there is no ISO standard yet or it is not well-known. Or, if they move to a foreign country they have to invent an appropriate spelling of their own name and their place of birth.

But traditionally, scientific libraries are the primary user of transliteration standards because they need them to file books from foreign authors or with foreign titles in their alphabetic catalogues and be able to find them back. As the Latin alphabet contains only 26 letters (abcdefghijklmnopqrstuvwxyz), letter groups like "huang" and accented letters like "è" are used to distinguish different characters from languages with a richer repertoire. As catalogue cards were written manually or with mechanical typewriters with dead accent keys or accents later added by hand, the early ISO standards did not restrict themselves to characters found in ASCII or ISO-8859-1.

Transliteration in the Unicode Age

Now that we have Unicode (ISO-10646), we are finally able to type heavily accentuated transliterations like ISO 9 on our computers. But we are also able to type the original script, so why bother at all with transliteration?

John Clews of the British Library, the chair of the ISO subcommittee responsible for transliteration (ISO/TC46/SC2: Conversion of Written Languages), says:

Despite computing standards like ISO/IEC 10646 and Unicode, there will always be a need for transliteration as long as people do not have the same level of competence in all scripts besides the script used in their mother-tongue, and may have a need to deal with these languages, or when they have to deal with mechanical or computerised equipment which does not provide all the scripts of characters that they need.

I agree. Transliteration standards can come in just as handy for Unicode implementers and users. Unicode allows you to mechanize (see my simple Perl scripts below) the standard transliterations on displays that are limited to one or two scripts (Lynx is particularly good at mapping UTF-8 text to ASCII) or for users who prefer to read transliterations into their own alphabet because they give a better idea of the phonetic value. You can also use transliteration standards in the reverse direction (like some of my Yudit kmaps) to enter non-Latin characters on the standard American computer keyboard.

Sample implementations

This directory http://czyborra.com/translit/ contains a few simple Perl scripts that perform standard transliterations on UTF-8 encoded texts. If you have corrections or additions to this collection I encourage you to send mail to roman@czyborra.com.

Roman Czyborra
$Date: 1998/07/30 09:37:25 $

Name Last modified Size Description

[PARENTDIR] Parent Directory -

[ ] ethiopic-sera 1998-12-09 19:38 4.4K

[TXT] HEADER.html 1998-11-27 22:46 4.4K

[ ] java 1998-07-13 17:36 742

[ ] greek-iso843 1998-07-13 16:14 1.3K

[ ] cyrillic-iso9 1998-07-13 10:35 3.1K

[ ] vietnamese-viqr 1998-07-13 10:35 2.0K

Name	Last modified	Size

Parent Directory		-
ethiopic-sera	1998-12-09 19:38	4.4K
HEADER.html	1998-11-27 22:46	4.4K
java	1998-07-13 17:36	742
greek-iso843	1998-07-13 16:14	1.3K
cyrillic-iso9	1998-07-13 10:35	3.1K
vietnamese-viqr	1998-07-13 10:35	2.0K

Apache Server at czyborra.com Port 80