What is the current status of UTF8? (OpenInsight 32-Bit) [Revelation On-Line Wiki]

OpenInsight 32-Bit, Jim Peters, dbakke@srpcs.com's Don Bakke, Matthew Crozier, Paxton Scott, Carl Pates, Gerry Van Niekerk

Join The Works program to have access to the most current content, and to be able to ask questions and get answers from Revelation staff and the Revelation community

At 14 MAR 2009 12:25:31PM Jim Peters wrote:

I visited this subject a few releases back, and as I recall data with accented characters etc. stored as UTF8 would display nicely inside the application but would not yet print correctly. Thus, we stayed with Windows ANSI for the time being. I was just wondering what the status was with the current releases.

Do UTF8 strings containing non ASCII characters print correctly assuming an appropriate font is used? I have been using Set_Printer whenever it is necessary to code a report in Basic. The last time I tried it, it would print UTF8 strings as bytes.

Also when selecting and sorting, are accented characters such as e, ê, è, é, ë handled in a language logical manner at this time? If not, have any of you come up with practical workarounds?

We have a lot of European language names in our database, and it would be nice to handle them in a more elegant manner if the means are available.

I'm just trying to figure out what works at this time and what does not.

Thanks,

Jim Peters

At 14 MAR 2009 09:20PM dbakke@srpcs.com's Don Bakke wrote:

Jim,

The problem with printing UTF8 characters in OI has always been connected to the ComponentOne VSPrinter controls used by the OIPI. Despite numerous requests that these legacy controls be upgraded to support UTF8, the company has opted to put their focus in .NET.

After much searching for an alternative printing solution for OpenInsight, Revelation decided that the best solution was to use the .NET solution from ComponentOne. However, this was still no easy task, especially since one of the key objectives was to support the OIPI syntax (which was largely just a thin wrapper around the VSPrinter properties and methods.) The ComponentOne .NET control was never designed to be compatible with VSPrinter. Consequently, the final solution is a good, but not perfect, emulation of the OIPI using the .NET control. Bryan Shumsky is to be commended for his work on this.

However, this solution was rolled out in 9.0. AFAIK, there is no (easy) way to port this into 8.0.x. If you are really desperate for a solution that will work in a pre-9.0 environment then drop us a line. We might have some ideas that will work for you.

dbakke@srpcs.com

SRP Computer Solutions, Inc.

At 15 MAR 2009 10:44PM Jim Peters wrote:

Hi Don,

Thanks for that detailed explanation of the challenging nature of the problem. If anyone can accomplish the impossible, it will be these guys.

There is no critical need at the moment. It just came up in conversation the other day and I was wondering what the current state of things was. Sounds like good progress.

The other question I had concerned the sorting of these accented characters in our queries. A little quick test with our data in Windows ANSI mode shows that all of these letters with diacritical marks sort AFTER the letter Z instead of in their correct alphabetical order. I was wondering if UTF8 mode handled this better, or maybe even did something worse if we are still sorting bytes.

Really, this could easily be fixed even in ANSI mode. We are already converting everything to uppercase for selects and sorts in most cases. I assume it is a "Convert @LOWERCASE to @UPPERCASE" type of an operation. All we would need to do in that case would be add a few more characters to our strings to convert the diacritical forms to their base character before sorting them. There is no useful purpose in sorting these characters in byte order, so it is not like it would break anybody's code.

(As if these guys don't have enough to do!)

Jim

At 15 MAR 2009 11:29PM Matthew Crozier wrote:

We get around this by sending any unicode text to OIPI as RTF text. OIPI seems to handle this for diacritics, just not for middle/far eastern characters (which I guess is an RTF limitation?). If the text is unicode, we use an RTF control to convert it to RTF text. Then use the ENABLERTF switches in OIPI so it will recognise it.

HTH, M@

[img]http://www.vernonsystems.com/images/logo_main_ani.gif[/img]

At 16 MAR 2009 11:24AM Jim Peters wrote:

Ahh, that is a helpful tip! For now we are getting along ok in ANSI mode, and it sounds like 9.x will eventually take care of the printing problem.

Can anyone tell me if UTF8 mode takes care of the sorting problem as far as diacritical marks are concerned? ANSI mode seems to just sort them in byte order, so they are always in the wrong place in sorted lists.

I am thinking I could create symbolic fields for sort and select purposes and use CONVERT to make the needed character substitutions so they would sort correctly. Basically just doing a CONVERT "èéêëÈÉÊË" to "EEEEEEEE" in X for the limited number of characters in the ANSI set would fix the sorting problem. It seems like this could be handled much more efficiently at the system level though, and without requiring a lot of effort or impacting any legacy code.

Jim

At 16 MAR 2009 04:57PM Matthew Crozier wrote:

create symbolic fields for sort and select purposes and use CONVERT to make the needed character substitutions

That's what we do - so I too would be interested to know if there's a better solution.

Cheers, M@

[img]http://www.vernonsystems.com/images/logo_main_ani.gif[/img]

At 16 MAR 2009 05:14PM Paxton Scott wrote:

Greetings!

I too am very interested in the state of utf-8. Like Jim we have many european characters in our database. Our users only access the application through the browser, and I am now working through the confluence of Basic+ (9.0) (Great editor improvements!), Javascript, PHP, HTML and Apache dealing with the sending and displaying of strings (which may be filenames). Thus far is looks to me that setting everything to utf-8 is best. At this point no advice to offer.

Paxton

paxton@thedce.com

At 30 MAY 2022 01:01AM Matthew Crozier wrote:

We get around this by sending any unicode text to OIPI as RTF text. OIPI seems to handle this for diacritics, just not for middle/far eastern characters (which I guess is an RTF limitation?). If the text is unicode, we use an RTF control to convert it to RTF text. Then use the ENABLERTF switches in OIPI so it will recognise it.

Well, now that we are migrating to OI10, our RTF solution for this doesn't work because we're using a 32bit DLL component to generate the RTF. I presume OI10 would be using a 64bit version of the VSview component for Classic OIPI reports.

Probably being optimistic here, but is there any chance the 64bit component could support unicode natively??

Cheers, M@

Vernon Systems

At 30 MAY 2022 05:06AM Carl Pates wrote:

Hi M@,

It all depends on ComponentOne, and I'm not aware that they changed their policy on refusing to support Unicode in their ActiveX solution. Feel free to email me an example of the text you want to support and I'll check it, but I'm not hopeful.

Regards

Carl Pates

At 30 MAY 2022 05:24PM Gerry Van Niekerk wrote:

For what its worth..

We have been printing in Thai, Chinese and Korean using OIPI for the last 23 years

both from Arev and OI

In Arev I wrote an interface program using the same code as OIPI

Thai is a only a font change

Korean and Chinese will only work when you run OI on a Chinese or Korean pc, and the same would be for other languages I would guess.

IE when you try to print Korean on my English server never worked.

When you do a PDF in Korean V9.x you need to use dot net

Hope that makes a bit of sense.

Gerry

View this thread on the Works forum...