Unicode Text Editors with a full bidirectional support must
behave as if they implemented the official Unicode Bidirectional
Algorithm. This algorithm is a convoluted process where, in
several pass, the logically ordered Unicode text is scanned,
and finally reordered into illogical visual order.
This documents describes the unexpected effects of Unicode Bidirectional Algorithm UAX#9 If you browser does not have a bug-free and full support for bidirectional characters, you might not see what I want to show you. You might need to get a compliant browser.
I have no affiliation with Unicode Consortium. Never had, never will.
Logical | Visual |
-10% TEST ARABIC | TSET CIBARA -10% |
ARABIC -10% TEST | TSET %10- CIBARA |
Segment Separator | Its effect is well defined, but surprising. |
Boundary Neutral | The location is not defined it can pop up at any place. |
msgstr "For this html document I have to write it this way: "
msgstr "سلام Gáspár محمد"As you see, I can not protect the text. If you set Yudit Editor’s Document Text Alignment to the right, you will see what the label will show. Something totally different.
You might find it surprising, that programs conforming to Unicode Standard Annex #9 I must render the following text segments as you see. I just substituted HEBREW with
Surprise #1: and ARABIC with . and I also inserted a Right to Left embedding mark so that you see what is going on):Input : HEBREW ~~~23%%% HEBREW abc
Output :
Input : ARABIC ~~~23%%% ARABIC abc
Output :
Input: HEBREW 1*5 1-5 1/5 1+5
Output:
Input: ARABIC 1*5 1-5 1/5 1+5
Output:
I have checked this with java reference code from Unicode Consortium
The Unicode Bidirectional Algorithm is irreversible. In other words, the logical text can be reordered into visual order, but there is no way to guess what the logically ordered text is, just by looking at the visual text.
This is a serious problem for digital signatures. If you want to sign a document, what you sign is the bit-stream, but what you see is the text. As there is no algorithm provided you can not possibly imagine, what you sign if you are just looking at the text.
I tested Yudit and found that it is, probably, 100% Compliant to the full Unicode Bidirectional UAX #9 algorithm. I can not prove that because of it is not possible to test that properly (the Unicode algorithm is inherently un-testable). However
I do not think that that UAX #9 algorithm is good.Moreover, I think that that algorithm should be replaced with one that makes more sense. My clean-room implementation of the implicit algorithm mostly lies in
You can use it in your GNU programs. If Unicode Consortium ever change their mind it would be very easy to replace that file.
So how much is:Input: HEBREW 10-2*5
Output:
If you don’t see: here it means your browser does not have full bidirectional support, or it is buggy. This means that you saw these pages all wrong. You should download Yudit and type “howto bidi” in the command area of the editor.
Input: ARABIC 10-2*5
Output:
If you don’t see: here it means your browser does not have full bidirectional support, or it is buggy. This means that you saw these pages all wrong.You should download Yudit and type “howto bidi” in the command area of the editor.
It is your choice. They both have 0 values, literally.
[Back] | [User Guide] | [Story] | [Yudit] |
Gaspar Sinai
Last updated: 2002-11-21