Editing Bidirectional Documents With Yudit
This page is Unicode utf-8 encoded. In case your browser did
not detect this, you may need to manually set the encoding.
Please refer to the manual of your browser.
From version 2.7 Yudit should show bidirectional text
just as any other Unicode application that implement
Unicode Bidirectional Algorithm UAX#9
Paragraphs with initial directionality LR, like English
text will be aligned to the left while texts with RL
initial directionality will be aligned to the right.
As Unicode Standard allows higher level protocols to
impose a Document Embedding, Yudit can enforce an LR
or RL embedding on the whole document if the user
sets it with the text embedding button. This will
force left or right alignment on the whole text.
What is implicit bidirectional behavior?
All characters in a Unicode belong to one of the many
bidirectional classes. Depending on these character properties
all characters in the documents must be reordered into a visual
order dictated by a rather convoluted algorithm in UAX#9.
Under implicit bidirectional behavior I mean the behavior
that purely relies on the characters bidirectional class
property.
How to invoke implicit bidi?
You don’t need to do anything, just type:
He said “سلام!”
Please note that I cheated here: I added a RLM (Right Left Mark)
U+200F at the end. I wanted to make the text more digestible in
this English document. This mark is visible in the editor window
but it will not appear when printing, or, when used in labels.
What is explicit embedding and override?
In addition to the inherent bidirectional properties of the
characters, Unicode allows text between certain markers to
render Left to Right or believe that the embedding context is
Left or Right.
These markers can be nested. The PDF (POP directional format)
marker restores the last embedding state.
- Directional Override
RLO (Right to Left Override)Embedded TextPDF (Pop Directional Format)
LRO (Left to Right Override)Embedded TextPDF (Pop Directional Format)
This encloses a text with an LR or RL explicit directionality, regardless of
their bidirectional property. However, this directional property
is (unfortunately) not used when the initial directionality is
determined, so your text might not be aligned as you expect.
According to UAX#9 P2:
In each paragraph, find the first character of type L, AL, or R.
Because paragraph separators delimit text in this algorithm,
this will generally be the first strong character after a
paragraph separator or at the very beginning of the text.
Note that the characters of type LRE, LRO, RLE, RLO are
ignored in this rule. This is because typically they are used
to indicate that the embedded text is the opposite direction
than the paragraph level
- Directional Embedding
RLE (Right to Left Embedding)Embedded TextPDF (Pop Directional Format)
LRE (Left to Right Embedding)Embedded TextPDF (Pop Directional Format)
This encloses an embedded text. Embeddings supposed to give some
protection for
the embedding context. The text in the embedding is (in most cases)
rendered as if the initial, embedding of the text would be RL or LR.
Please note that there are some characters that make this mission
impossible: in fact it is not really possible to make use of RLE
or LRE if you use those characters. (Should they be forbidden?
Read on).
In Yudit you do not need to care about LRE,RLE,RLO,LRO,PDF markers,
they are totally hidden. Your embedded text will have a brighter
or darker background, this way you can
tell the embedding range.
Unicode allows for 3 levels of support for bidirectional algorithm:
1. No bidirectional formatting. This implies that the system
does not visually interpret characters from right-to-left
scripts.
2. Implicit bi-directionality. The implicit bidirectional algorithm
and the directional marks RLM and LRM are supported.
3. Full bi-directionality. The implicit bidirectional algorithm,
the implicit directional marks, and the explicit directional
embedding codes are supported: RLM, LRM, LRE, RLE, LRO, RLO, PDF.
Yudit has now full bidirectional support (3).
How to do explicit direction override?
To override implicit directionality of characters press Override
Direction <Ctrl><D> to change direction. Then simply continue
typing. You can get out of this by the cursor <Ctrl><Y> (Yield
Direction) button. You can clearly distinguish the embedded text.
I said “NO WAY!”.
How to do simple explicit embedding?
Similarly embedding a Right-Left text in a Left-Right document
needs <Ctrl><E> (Embedding Override). This is good, for instance
if you want to say:
He said: “سلام!”
Without the Right-to-Left embedding this would look pretty bad in
this English document:
He said “سلام!”
I already have a text that I need to embed/un-embed. How to
do that?
Before embedding/un-embedding select the text. Selection can be made
for instance with <Alt> arrow keys. After selection with the keys
keep pressing <Alt> and press <D> for Direction Override or
<Alt><E> for Embedding Override. You can bring back the text to
no embedding level with <Alt><Y> (Yield Embedding).
What is document text embedding?
Yudit can enforce an initial embedding level to the whole document.
When Yudit is started the initial embedding is reset to none.
The text is also saved without initial embedding enforcement tags.
When no initial embedding is enforced, your text can show up
aligned to the left or to the right, depending on the natural
paragraph embedding level.
I want to embed LR text but my embedding arrow is RL
The direction of the embedding arrows on the tool-bar always
point to the opposite direction of the current embedding;
the context where the cursor is. This is to make the
operation faster and make less errors. It is usually
not desired to embed a text in an LR document as LR. However,
you can do this with this trick:
If you want to embed LR text in the document with LR embedding
change the Document Text Embedding to the RL. Now you can make
the LR embedding.
Important Notes
In po file translations you might want to consider embedding your
RL text with explicit RLO so that you will see what you will get
on that label:
Without explicit embedding:
msgstr "سلام Gáspár, محمد"
With explicit embedding, you will see what the label will eventually
show:
msgstr "سلام Gáspár, محمد"
Please note that most applications do not support explicit embedding,
so deal with them sparingly. Moreover, explicit embedding does not
save you from the effects of Unicode Bidirectional algorithm.
You have this text:
msgstr "سلام Gáspár محمد"
I put the whole thing into RL embedding marks, because I want to see
them this way, in my RL text label. It works. But what if I replace the
leftmost space with a tab?
msgstr "سلام Gáspár محمد"
For this html document I have to use the pre tag to see it:
msgstr "سلام Gáspár محمد"
Now try to put this in a label. (Try pressing the Document Text
Embedding button in Yudit for the same effect). Now you
see what you will see in that label. Well, to tell the truth nothing
saves you from these effects of Unicode Bidirectional algorithm. If
you want to see why this happens please read
Surprise Effects on this server.
Fortunately, if you use gettext you will be able to use '\t' character
for TAB. So when translating po file please always use '\t', like this:
msgstr "سلام Gáspár\tمحمد"
But in short: do not use segment separators in your po translation
text as is. In case of a non-computer, non-gettext text you are on
your own.
Comparing With Other Applications
I tried to compare Yudit bidi to other applications but,
the applications had problems even with this simple text:
Hello العربية 14محمد RLTXT nothing
I may try it again at a later time.
Links
Gaspar Sinai
Last updated: 2002-11-20