What document(s) define(d) the Unicode standard?

The Players

Unicode's character set is defined in parallel by two co-operating bodies:

  1. as "The Unicode Standard" by the California-based Unicode Consortium, an open special-interest group founded in 1991 to design and promote a universal, uniform, unique, unambiguous worldwide character encoding standard, comprised of mainly American software manufacturers (Xerox, Apple, IBM, Microsoft, Novell, Sun, Next, ...) whose membership fees finance the nonprofit organization, and

  2. as "ISO/IEC 10646 Universal Multiple-Octet Coded Character Set (UCS)" by the multibyte charset working group (WG2) of the character set subcommittee (SC2) of the joint technical committee JTC1 of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), defining international information processing standards through the votes of national delegates.

The relationship between Unicode and the ISO UCS has been the subject of recurring discussions. Unicode can either be seen as a special implementation of the ISO UCS or as its underlying idea. Fact is that the two standards have converged and benefitted from each other and specify identical character codes so that Unicode and UCS can be used synonymously for all practical purposes.

Unicode 1.0

"The Unicode Standard, Version 1.0" was first published as a book (ISBN 0-201-56788-1 from Addison-Wesley, out of print) in 1991. The first Unicode-incompatible ISO WG draft DIS-10646.1:1990 (no longer available) had been voted down by the ISO plenary in favor of the linear Unicode proposal.

Unicode 1.1

The two proposals were subsequently merged and published as Unicode-compatible international standard ISO/IEC 10646 Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Multilingual Plane (price code XN = 351 CHF for 754 pages with much less descriptive text but sharper images on brighter paper than the Unicode standard) and an aligned Unicode 1.1 (Unicode Technical Report # 4 - diff to Unicode 1.0) in 1993.

Unicode 2.0

Amendments 1..7 (UTF-16, UTF-8, reordered Hangul syllables, ...) to ISO/IEC 10646-1 were reflected in the book "The Unicode Standard, Version 2.0" (ISBN 0-201-48345-9) in 1996. The Chinese ideographs are now included instead of being outsourced into a second volume. The book reads like an encyclopedia of the world's scripts and is definitely a bargain at 59 USD for 930 A4 pages. Much of its contents is available online (character properties database, textual descriptions, cross references, illustrative glyph charts, even the full text of some chapters: TOC, 1, 3.11) but not yet the illustrated texts describing some of the more complicated algorithms like the Devanagari and Tamil rendering rules and implementation hints and not bundled so nicely.

Unicode 2.1

Unicode 2.1 (Unicode Technical Report # 8, diff to Unicode 2.0) fixed a number of errors and added the U+20AC EURO SIGN for the new European currency and U+FFFD OBJECT REPLACEMENT CHARACTER as placeholder for images etc in 1998.

Unicode 3.0

Recent ISO-10646-1 amendments (such as Ethiopic and others in the pipeline) shall be reflected in Unicode 3.0 scheduled for publication in the near future (1999?). Besides extending the existing scripts (inclusive of 6'582 new CJK Unified Ideographs, Extension A) there shall be new support for Braille, Canadian Aboriginal Syllabics, Cherokee, Ethiopic, Khmer, Mongolian, Myanmar, Ogham, Runic, Sinhala, Syriac, Thaana, Yi. Unicode 3.0 shall still be limited to BMP characters and may be accompanied by a revised second edition ISO-10646-1 to reduce the spaghetti of incremental amendments.

Unicode 4.0

Some later version after Unicode 3.0 in the next millenium shall be the first to include defined characters in the non-BMP planes. A second part ISO-10646-2 shall comprehensively define the contents of all non-BMP planes.

The Unicode website also provides an official version history now.

Roman Czyborra
$Date: 1999/01/11 18:40:35 $