From czyborra@dds.nl Tue Aug 4 15:14:41 1998 Date: Tue, 4 Aug 1998 15:14:36 +0200 (CEST) From: Roman Czyborra To: Asmus Freytag , Misha Wolf , Unicode Consortium , Unicode Errata bcc: Roman Czyborra Subject: typo: gapOffset != AE00 Message-ID: Link: PGP-Fingerprint: 2708E38751D3FB90456AC169A49BE6E6 (1024/87329995) User-Agent: Pine/3.96 (private offline X notebook; Linux 2.0.32 i586) Organization: =?UTF-8?Q?Technische_Universit=C3=A4t_Berlin?= MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Status: RO X-Status: A http://www.unicode.org/unicode/reports/tr6.html seems to contain a typo in Table X-3: the gapOffset should be defined as AC00 instead of AE00 because otherwise you get 68 * 80 + AE00 == E200 != E000. Please apply the following patch: *** tr6.html 1998/01/28 08:17:06 6.10 --- tr6.html 1998/08/04 13:04:48 *************** *** 397,403 **** 68..A7 ! x*80+AE00 half-blocks from U+E000 to U+FF80 --- 397,403 ---- 68..A7 ! x*80+AC00 half-blocks from U+E000 to U+FF80 Cheers, Roman http://czyborra.com/ From czyborra@dds.nl Tue Aug 4 16:04:37 1998 Date: Tue, 4 Aug 1998 16:04:33 +0200 (CEST) From: Roman Czyborra To: Asmus Freytag , Misha Wolf , Unicode Consortium , Unicode Errata bcc: Roman Czyborra Subject: hbyte & 1FFF ??? In-Reply-To: Message-ID: Link: PGP-Fingerprint: 2708E38751D3FB90456AC169A49BE6E6 (1024/87329995) User-Agent: Pine/3.96 (private offline X notebook; Linux 2.0.32 i586) Organization: =?UTF-8?Q?Technische_Universit=C3=A4t_Berlin?= MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Status: RO X-Status: A I think that anding a hbyte with 1F is better than & 1FFF: *** tr6.html 1998/08/04 13:04:48 6.11 --- tr6.html 1998/08/04 13:46:58 *************** *** 449,455 **** The window index n is given by the top 3 bits of hbyte. The window offset is calculated from the remaining thirteen bits of hbyte and lbyte as follows:

!

offset = 10000 + (80 * ((hbyte & 1FFF) * 100 + lbyte))

where & is the bitwise AND operator and all values are in hexadecimal notation. After an extended Window is defined each subsequent byte in the range 80 to FF represents --- 449,455 ---- The window index n is given by the top 3 bits of hbyte. The window offset is calculated from the remaining thirteen bits of hbyte and lbyte as follows:

!

offset = 10000 + (80 * ((hbyte & 1F) * 100 + lbyte))

where & is the bitwise AND operator and all values are in hexadecimal notation. After an extended Window is defined each subsequent byte in the range 80 to FF represents From czyborra@dds.nl Tue Aug 4 17:06:18 1998 Date: Tue, 4 Aug 1998 17:06:14 +0200 (CEST) From: Roman Czyborra To: Asmus Freytag , Misha Wolf , Unicode Consortium , Unicode Errata bcc: Roman Czyborra Subject: SC7 does not switch to Cyrillic, but SC2 In-Reply-To: Message-ID: Link: PGP-Fingerprint: 2708E38751D3FB90456AC169A49BE6E6 (1024/87329995) User-Agent: Pine/3.96 (private offline X notebook; Linux 2.0.32 i586) Organization: =?UTF-8?Q?Technische_Universit=C3=A4t_Berlin?= MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Status: RO X-Status: A According to table X-5 of http://www.unicode.org/unicode/reports/tr6.html Russian should be using the default SC2 instead of SC7: *** tr6.html 1998/08/04 13:46:58 6.12 --- tr6.html 1998/08/04 14:56:16 *************** *** 808,815 ****

Russian

!

Russian can use the default position of window 7. The first byte of the compressed data ! is the tag SC7.

Unicode values (6 characters):

--- 808,815 ----

Russian

!

Russian can use the default position of window 2. The first byte of the compressed data ! is the tag SC2.

Unicode values (6 characters):

*************** *** 817,823 ****

Compressed (7 bytes):

!
17 9C BE C1 BA B2 B0

All Features

--- 817,823 ----

Compressed (7 bytes):

!
12 9C BE C1 BA B2 B0

All Features

From czyborra@dds.nl Fri Aug 14 13:38:25 1998 Date: Fri, 14 Aug 1998 13:38:21 +0200 (CEST) From: Roman Czyborra To: Asmus Freytag , Misha Wolf , Unicode Consortium , Unicode Errata bcc: Roman Czyborra Subject: TR6 - error in contrived example and SCSU.java In-Reply-To: Message-ID: Link: PGP-Fingerprint: 2708E38751D3FB90456AC169A49BE6E6 (1024/87329995) User-Agent: Pine/3.96 (private offline X notebook; Linux 2.0.32 i586) Organization: =?UTF-8?Q?Technische_Universit=C3=A4t_Berlin?= MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Status: RO X-Status: A > Russian should be using the default SC2 instead of SC7: The third example in http://www.unicode.org/unicode/reports/tr6.html is also wrong because =41=DF=11=DF=03=01=03=DF=19=01=DF=0E=DF=DF=0F=01=DF=F0=F0=00=F1=FF=FF=FF expands to with your sample implementation java CompressMain /expand /dev/stdin and to with my own implementation but not to the mentioned in your technical report. Manually, I get =41 pass as =DF dynamicOffset[0] + DF - 80 = 0080 + 5F = =11 SC1 change to activewindow = 1 (locking shift) =DF dynamicOffset[1] + DF - 80 = 00C0 + 5F = =03 SQ2 single quote 2 =01 staticOffset[2] + 01 = 0100 + 01 = =03 SQ2 single quote 2 =DF dynamicOffset[2] + DF - 80 = 0400 + 5F = =19 SD1 define window 1 =01 dynamicOffset[1] := winOffset [1] = 0080 =DF dynamicOffset[1] + DF - 80 = 0080 + 5F = =0E SQU single quote Unicode =DF=DF =0F SCU change to Unicode mode =01=DF =F0 UQU quote Unicode =F0=00 =F1 UDX define extended =FF=FF dynamicOffset[7] := 10000 + FFF80 = 10F80 =FF dynamicOffset[7] + FF - 80 = 10F80 + 7F = which is what I implemented. I think that the static final int initialDynamicOffset[] = { 0x0080, // Latin-1 ! 0x0100, // Latin Extended A in ftp://ftp.unicode.org/Public/PROGRAMS/SCSU/SCSU.java has to be brought in sync with table X-5 Default Positions for Dynamically Positioned Windows: static final int initialDynamicOffset[] = { 0x0080, // Latin-1 ! 0x00C0, // combined partial Latin-1/-A Greetings, Roman http://czyborra.com/ From czyborra@dds.nl Sat Aug 15 18:54:09 1998 Date: Sat, 15 Aug 1998 18:54:06 +0200 (CEST) From: Roman Czyborra To: Asmus Freytag , Misha Wolf , Unicode Consortium , Unicode Errata bcc: Roman Czyborra Subject: Re: TR6 - error in contrived example and SCSU.java In-Reply-To: Message-ID: Link: PGP-Fingerprint: 2708E38751D3FB90456AC169A49BE6E6 (1024/87329995) User-Agent: Pine/3.96 (private offline X notebook; Linux 2.0.32 i586) Organization: =?UTF-8?Q?Technische_Universit=C3=A4t_Berlin?= MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Status: RO X-Status: I have just added the joining of UTF-16 surrogates to my implementation at http://czyborra.com/scsu/ and noticed another oddity in your example: > =19 SD1 define window 1 > =01 dynamicOffset[1] := winOffset [1] = 0080 > =DF dynamicOffset[1] + DF - 80 = 0080 + 5F = > =0E SQU single quote Unicode > =DF=DF > =0F SCU change to Unicode mode > =01=DF is an unpaired low-surrogate. Unicode conformance requirement C4 says that a process shall not interpret an unpaired high- or low-surrogate as an abstract character. What else are we supposed to do with it? My implementation outputs now as if there had been a . From support@likasoft.com Mon Sep 29 14:49:26 2014 Delivered-To: czyborra@gmail.com Received: by 10.194.172.41 with SMTP id az9csp175913wjc; Mon, 29 Sep 2014 05:49:26 -0700 (PDT) X-Received: by 10.180.20.139 with SMTP id n11mr66840419wie.22.1411994966301; Mon, 29 Sep 2014 05:49:26 -0700 (PDT) Return-Path: Received: from new.woffs.de (new.woffs.de. [2a02:2770:7:0:21a:4aff:fe20:b137]) by mx.google.com with ESMTPS id ek3si12362484wic.2.2014.09.29.05.49.26 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 29 Sep 2014 05:49:26 -0700 (PDT) Received-SPF: none (google.com: support@likasoft.com does not designate permitted sender hosts) client-ip=2a02:2770:7:0:21a:4aff:fe20:b137; Authentication-Results: mx.google.com; spf=neutral (google.com: support@likasoft.com does not designate permitted sender hosts) smtp.mail=support@likasoft.com Received: from relay.mailchannels.net (aso-006-i398.relay.mailchannels.net [143.95.111.16]) by new.woffs.de (8.14.4/8.14.4/Debian-4) with ESMTP id s8TCnLk0019834 for ; Mon, 29 Sep 2014 14:49:24 +0200 X-Sender-Id: totalchoicehosting|x-authuser|zcxtjcdu Received: from corleone.tchmachines.com (ip-10-237-13-110.us-west-2.compute.internal [10.237.13.110]) by relay.mailchannels.net (Postfix) with ESMTPA id 57AED60A6C for ; Mon, 29 Sep 2014 12:49:17 +0000 (UTC) X-Sender-Id: totalchoicehosting|x-authuser|zcxtjcdu Received: from corleone.tchmachines.com (corleone.tchmachines.com [10.227.41.147]) (using TLSv1 with cipher DHE-RSA-AES256-SHA) by 0.0.0.0:2500 (trex/5.2.14); Mon, 29 Sep 2014 12:49:20 GMT X-MC-Relay: Neutral X-MailChannels-SenderId: totalchoicehosting|x-authuser|zcxtjcdu X-MailChannels-Auth-Id: totalchoicehosting X-MC-Loop-Signature: 1411994959814:53771702 X-MC-Ingress-Time: 1411994958802 Received: from corleone.tchmachines.com ([208.76.80.228]:40964 helo=Z) by corleone.tchmachines.com with esmtpa (Exim 4.82) (envelope-from ) id 1XYaOA-0002v4-TH for roman@czyborra.com; Mon, 29 Sep 2014 08:49:14 -0400 Message-ID: <25B1C3A3DF954B4EAD82E23D37080654@a> From: "Likasoft - Support" To: Subject: SCSU Date: Mon, 29 Sep 2014 09:49:36 -0400 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="utf-8"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.3790.4657 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.4721 X-AuthUser: zcxtjcdu X-Greylist: Sender IP whitelisted by DNSRBL, not delayed by milter-greylist-4.3.9 (new.woffs.de [37.252.121.201]); Mon, 29 Sep 2014 14:49:25 +0200 (CEST) Dear Roman, I found typo in your "scsu.c" file too. It is in "win[256]" table. You wrote 0x3800, but it must be 0x3380. It seems that some other programmers use your sources in some text editors/converters with this typo, so it may decode incorrectly sometimes. Bye. From roman@czyborra.com Tue Sep 30 14:26:35 2014 Delivered-To: czyborra@gmail.com Received: by 10.194.172.41 with SMTP id az9csp337002wjc; Tue, 30 Sep 2014 05:26:36 -0700 (PDT) X-Received: by 10.180.20.79 with SMTP id l15mr5202556wie.37.1412079995959; Tue, 30 Sep 2014 05:26:35 -0700 (PDT) Return-Path: Received: from new.woffs.de (new.woffs.de. [2a02:2770:7:0:21a:4aff:fe20:b137]) by mx.google.com with ESMTPS id d6si16369389wiz.9.2014.09.30.05.26.35 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Sep 2014 05:26:35 -0700 (PDT) Received-SPF: none (google.com: roman@czyborra.com does not designate permitted sender hosts) client-ip=2a02:2770:7:0:21a:4aff:fe20:b137; Authentication-Results: mx.google.com; spf=neutral (google.com: roman@czyborra.com does not designate permitted sender hosts) smtp.mail=roman@czyborra.com Received: from new.woffs.de (localhost [127.0.0.1]) by new.woffs.de (8.14.4/8.14.4/Debian-4) with ESMTP id s8UCQYZm013193 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 30 Sep 2014 14:26:34 +0200 Received: from localhost (czyborra@localhost) by new.woffs.de (8.14.4/8.14.4/Submit) with ESMTP id s8UCQY7r013189; Tue, 30 Sep 2014 14:26:34 +0200 X-Authentication-Warning: new.woffs.de: czyborra owned process doing -bs Date: Tue, 30 Sep 2014 14:26:34 +0200 (CEST) From: Roman Czyborra To: Likasoft - Support cc: roman@czyborra.com Subject: Re: SCSU In-Reply-To: <25B1C3A3DF954B4EAD82E23D37080654@a> Message-ID: References: <25B1C3A3DF954B4EAD82E23D37080654@a> User-Agent: Alpine 2.02 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.3.9 (new.woffs.de [127.0.0.1]); Tue, 30 Sep 2014 14:26:35 +0200 (CEST) 2014-09-30[Di]14:22 Roman Czyborra read that 2014-09-29[Mo]15:49 Likasoft - Support wrote <25B1C3A3DF954B4EAD82E23D37080654@a>: > Dear Roman, Dear Eugene? > I found typo in your "scsu.c" file too. It is in "win[256]" table. You > wrote 0x3800, but it must be 0x3380. It seems that some other > programmers use your sources in some text editors/converters with this > typo, so it may decode incorrectly sometimes. Well, that is an embarassing typo! Thank you so much for finding it! After 16 years! I have just corrected http://czyborra.com/scsu/scsu.c and am asking myself: do you want any finders' names to be thanked personally?