{{tag>category:"OpenInsight" author:"Jim Vaughan" author:"Mike Ruane" author:"j Vaughan" author:"Steve Epstein" author:"Oystein Reigem"}}

[[https://www.revelation.com/the-works|Join The Works program to have access to the most current content, and to be able to ask questions and get answers from Revelation staff and the Revelation community]]


==== Japanese char set (OpenInsight) ====
=== At 17 OCT 2001 09:01:28PM Jim Vaughan wrote: ===
 
 
<QUOTE>
Does the new 32-bit stuff have any support for the Japanese character set?


If so would it support this character set in the menus, forms and data?
</QUOTE>
----

=== At 18 OCT 2001 07:28AM Mike Ruane wrote:  ===

<QUOTE>Jim-


We're looking into it- as well as Chinese. One of the problems is that we don't speak Chinese or Japanese and expect trouble installing those versions of Windows.


Mike

</QUOTE>

----

=== At 18 OCT 2001 01:32PM j Vaughan wrote:  ===

<QUOTE>What kind of time frame are we looking at? 

</QUOTE>

----

=== At 19 OCT 2001 10:47PM Jim Vaughan wrote:  ===

<QUOTE>I know it's hard to guess how long something like this might take, but ... I need to know. We have a customer in Japan that would like to buy but needs the Japanese char set. 


Give me a best case worst case. If you think it can be done it will take from.... to....


Thanks.

</QUOTE>

----

=== At 22 OCT 2001 07:18AM Mike Ruane wrote:  ===

<QUOTE>Jim-


I have a new machine I can test it on, and someone who can help me get it installed. I should have some more details by next week.


Mike

</QUOTE>

----

=== At 22 OCT 2001 04:22PM j Vaughan wrote:  ===

<QUOTE>You guys are great. 


I look forward to hearing how it goes. 

</QUOTE>

----

=== At 29 OCT 2001 01:05PM Jim Vaughan wrote:  ===

<QUOTE>I just heard from my customer, they are meeting next week. 


Would it possible to know if this is gaoing to be available by then?

</QUOTE>

----

=== At 29 OCT 2001 02:26PM Mike Ruane wrote:  ===

<QUOTE>Jim-


We're formatting the machine today.


Mike

</QUOTE>

----

=== At 29 OCT 2001 03:29PM Jim Vaughan wrote:  ===

<QUOTE>Great, keep me updated. 

</QUOTE>

----

=== At 29 OCT 2001 04:25PM Steve Epstein wrote:  ===

<QUOTE>Dear Jim and Mike,


I have asked the same question.


I actually have a Japanese WIN2000 machine from our clients in Japan.  Any testing I can do would be appreciated.  I have the fonts, et al.


Steve

</QUOTE>

----

=== At 29 OCT 2001 05:24PM Mike Ruane wrote:  ===

<QUOTE>Guys-


Thanks-

First blush seems to be a no, as we need Unicode, which would destroy our data since we make heavy use of Ascii 251 to 255 as our system delimiters.


MIke

</QUOTE>

----

=== At 30 OCT 2001 10:27AM Jim Vaughan wrote:  ===

<QUOTE>So what does that mean, do you have any other avenues to pursue?

</QUOTE>

----

=== At 30 OCT 2001 06:25PM Oystein Reigem wrote:  ===

<QUOTE>That must be the next big project. After the 32-bit version. To rid OI of those troublesome delimiters.


Just trying to make myself popular.


- Oystein -

</QUOTE>

----

=== At 04 NOV 2001 03:57PM j Vaughan wrote:  ===

<QUOTE>So this is no, for now? Or no forever? 

If it's no for now, when in the future might it be available. 


I just need to give my customer an answer, even if it's one they don't like.

</QUOTE>

----

=== At 05 NOV 2001 06:42AM Oystein Reigem wrote:  ===

<QUOTE>Mike,


It would be nice if Unicode could be implemented in OpenInsight and kill dead the international-characters-versus-delimiters problem. But there are many questions on the way. I assume you've looked at some of them already.


There are many different Unicode encoding formats. Some of them are fixed-length (1, 2, 3 or 4 bytes per character), some variable (characters with a mix of different lengths).


I believe there are two basic alternatives if one wants to implement a multi-byte character encoding system in a database system like OpenInsight, where special characters or byte values are used to delimit various units of data during storage and computing.


One is to use a fixed-length character encoding format and let the delimiters be multi-byte too. This means among other things that the file system must be rewritten to handle multi-byte characters instead of single-byte characters. I don't expect that can be done overnight.


The other is as much as possible to handle multi-byte encoded text as any other byte sequence, and keep the old single-byte delimiters. But then one must choose an encoding format that avoids collisions with the delimiters. E.g with a 2-byte encoding format, none of the 2 bytes must ever be in the range 250-255.


But is the latter possible? Is there a Unicode encoding format (e.g one that can be used for Japanese) where no byte is in the range 250-255? I believe no.


But there [i]are[/i] formats where certain [i]other[/i] byte values never occur. E.g, the UTF-8 2-, 3- and 4-byte encodings always have byte values with the highest bit set to 1 (to distingush them from the single-byte UTF-8 encoding, which is plain old 7-bit ASCII). So perhaps by using that old trick with the bi-directional CHARMAP it's possible after all? E.g, shunt 250-255 down by 128.


Next question is how comparisons and sorting can be done on multi-byte data.


- Oystein -


PS. I don't know [i]that[/i] much about Unicode.


But I have colleagues who know a bit more.


And there's the Unicode website .


</QUOTE>


[[https://www.revelation.com/revweb/oecgi4p.php/O4W_HANDOFF?DESTN=O4W_RUN_FORM&INQID=WORKS_READ&SUMMARY=1&KEY=32A6C7ADD37E9BBB85256AE90005A0B9|View this thread on the Works forum...]]