WARNING:
This file is encoded with Yudit utf-8-s encoder.

I have downloaded the Markus Kuhn's UTF-8-demo.txt
test file from:

  http://www.cl.cam.ac.uk/~mgk25/ucs/examples/

This file contains purposefully malformed sequences.
utf-8 text files should not contain surrogates. Yudit
reads them  and indicates that they came as surrogates,
but displays them as a supplementary plane characters.

The Glyph Info clearly indicates that something is 
wrong. When you move the cursor after this character:

      í €í°€   Glyph Info: [sgt:00010000] DC80 DC00

is displayed. For well formed sequences Glyph Info 
should never show [sgt:]:

  http://www.unicode.org/versions/corrigendum1.html

When such surrogates are written back to disk, Yudit's 
build-in utf-8 converter will write the shortest form,
as required by utf-8, thus they will not be written 
back as surrogates, but as shorter supplementary plane
characters.

If you want to keep the binary integrity of the file, with
malformed sequences and surrogate utf-8 characters, you
should use Yudit's build-in utf-8-s converter instead of
utf-8. The usage of the built-in utf-8-s converter is not
recommended, only use it for test purpose. On the other
hand the utf-8 encoder will always generate the shortest form.


GÃ¡spÃ¡r Sinai <gaspar@yudit.org> 2002-11-22

5.1 Single UTF-16 surrogates
5.1.1  U+D800 = ed a0 80 = "í €"
5.1.2  U+DB7F = ed ad bf = "í­¿"
5.1.3  U+DB80 = ed ae 80 = "í®€"
5.1.4  U+DBFF = ed af bf = "í¯¿"
5.1.5  U+DC00 = ed b0 80 = "í°€"
5.1.6  U+DF80 = ed be 80 = "í¾€"
5.1.7  U+DFFF = ed bf bf = "í¿¿"

5.2 Paired UTF-16 surrogates
5.2.1  U+D800 U+DC00 = ed a0 80 ed b0 80 = "í €í°€"
5.2.2  U+D800 U+DFFF = ed a0 80 ed bf bf = "í €í¿¿"
5.2.3  U+DB7F U+DC00 = ed ad bf ed b0 80 = "í­¿í°€"
5.2.4  U+DB7F U+DFFF = ed ad bf ed bf bf = "í­¿í¿¿"
5.2.5  U+DB80 U+DC00 = ed ae 80 ed b0 80 = "í®€í°€"
5.2.6  U+DB80 U+DFFF = ed ae 80 ed bf bf = "í®€í¿¿"
5.2.7  U+DBFF U+DC00 = ed af bf ed b0 80 = "í¯¿í°€"
5.2.8  U+DBFF U+DFFF = ed af bf ed bf bf = "í¯¿í¿¿"