A Little Unicode Story


Ihis is a short story that happened to me when I was developing my free Unicode text editor program Yudit. This is a plain recording of events, this report as much unbiased as it can be.

The story contains just a little technical knowledge but even people without that knowledge will understand it.

The Beginning

In 2001 November I published this article. to utf8-linux mailing list. I was looking for testers to test a newly introduced feature, shaping support. Shaping support is needed to render texts, like Arabic and Syriac. In parallel to shaping changes I also allocated some time and ported Yudit to Windows.

The Windows Port

Windows port itself was not very difficult, because Yudit does all the work internally, including keyboard mapping and font rendering. What took some time is getting around weird limitations.

For instance Windows refused to copy the glyph image rendered by Yudit. I had no intentions to use the font rendering engine the comes with Windows so I decided to track down the problem. I found out Windows 98 silently refuses to create more than about 500 bitmap images, to the screen, an it does it without any error message. As some scripts, like Chinese and Japanese need more than 500 glyphs I made a workaround: instead of creating a bitmap for each glyph I just create one, really big bitmap, and use different areas of that big map.This worked and the Windows version was born. Without this workaround you can not render your own glyphs.

The Mysteriously Disappearing riched20.dll

One day we could not compose any emails on that computer in Outlook Express. I found out that riched20.dll, which does text rendering, was missing. A quick look at google’s Usenet archive revealed that this is a common problem

"Subject: Re: riched.dll missing - me too!"

So this dll does indeed disappear mysteriously on Windows. They suggest to get the latest version of this dll on such an occasion. This was not the last time I had to deal with this dll.

The Official Unicode Mailing List

In 2002, February I wanted to start a discussion on the official Unicode mailing list about what I believe are serious security problems in Unicode:

Off the list, I immediately received a direct reply from Unicode Consortium that this is already in the archives and there are no security problems with Unicode as "security experts" already examined it.

One of the things I wanted to figure out how to solve a problem of irreversible algorithms introduced by Unicode Bidirectional Algorithm UAX#9

Why do I feel irreversibly is a problem?

The Unicode Bidirectional Algorithm is irreversible. In other words, the logical text can be reordered into visual order, but there is no way to guess what the logically ordered text is, just by looking at the visual text.

This is a serious problem for digital signatures. If you want to sign a document, what you sign is the bit-stream, but what you see is the text. As there is no algorithm provided you can not possibly imagine, what you sign if you are just looking at the text.

I showed the first example. This text could represent a problem for translators of Unix getmessage .po files where Arabic text is embedded in English text.

http://www.yudit.org/security/

Without RLE,PDF marks the text will be rendered in different order in the GUI than what you see in the po file. An explicit embedding mark is needed to avoid ambiguity (but sometimes as in TAB even an explicit mark wont help).

Creating the screen-shot in the second example on the second example on that page leads us to another story.

The riched20.dll Is In Trouble Again

I sent an email to my colleague. I was slightly hindered because my company just had a nimda virus attack. It was the first (and hopefully the last) time that this company was successfully attacked by a virus. I admit first I thought it came through Unicode mailing list like this:

unicode#10564 or unicode#10559

Fortunately on my Linux box it was trapped in my pine mailer causing no harm. The sender was a guy who after he asked some questions on the list.

I have to apologize for having thought so. As it turned out it did NOT come from the mailing list. Someone accidentally picked it up from the web with Internet Explorer.

It took almost a week to get rid of the virus. Our system administrator distributed a paper listing all the steps. One of the steps was replacing riched20.dll. You can check this out.

Back To Unicode Mailing List

During this turmoil my colleague received my test email, so I could send my second example to the official Unicode mailing list.

The second example shows how the same ordering difference appears in certain applications that override the document embedding. Overriding document embedding is allowed by the Unicode Standard.

After the second example I received several very clear messages from the list indicating that I am unwanted on the official Unicode Mailing List:

unicode#10746 or unicode#10751

I am not on the list any more. Still, I will continue to improve Yudit. Yudit now has Full Bidirectional Support. For better or worse.

You can find the documents related to this story, locally at: http://www.yudit.org/bidi/reference/

This page will probably never exist: http://www.unicode.org/security/

[Back] [User Guide] [Surprise] [Yudit]

Gaspar Sinai
Last updated: 2002-11-21

MadeWithYudit.png