UTF8 Mode (OpenInsight 32-Bit) [Revelation On-Line Wiki]

OpenInsight 32-Bit, chip fichot, Alexander Holliday

Join The Works program to have access to the most current content, and to be able to ask questions and get answers from Revelation staff and the Revelation community

At 24 MAY 2011 12:14:36PM chip fichot wrote:

Hello

On OI Version 8.0.8 and we are running into issues where users are pasting text into a text box. When the text contains the Spanish accented character ú, OI converts this upon save to an @STM delimiter as it interprets it as ANSI 250 ('collision' character).

I understand that selecting UTF8 mode in the application properties and form designer options will resolve the issue (this option was not previously enabled in the app).

However, I'm not really clear if there are any ramifications to enabling UTF8 mode other than "In UTF8 mode, all character strings input and output to the user are processed as multi-byte characters sequences as defined by the UTF-8 specification.", which is the behavior I am seeking.

My real concern is if enabling UTF-8 mode has the potential to 'break' anything that is currently working. Understanding the full effects of enabling this option and/or the logic associated with it would be helpful.

I'm also a little confused as to whether or not this option is limited to data read and written by a form. If I were to read data saved by this form and saved it to another record programmatically, would it be saved in the same Unicode format or would the system convert it to @STM?

I did test printing the data using OIPI by saving it on the form, then reading it in the OIPI routine and the output was as expected (it printed the ú character correctly).

Any input/explanation would be appreciated.

TIA

- Chip

At 24 MAY 2011 01:36PM chip fichot wrote:

I did some further searching on the board and have a better understanding of this.

Since there may be issues with string processing if the UTF8 mode is turned on globally, I'm going to opt to programmatically turn it on/off as needed using the SetUTF8 routine.

At 27 MAY 2011 03:00PM Alexander Holliday wrote:

Hi Chip,

If you are STORING characters which (in latin1 ANSI) collide with the code points of the system delimiters, then you should be in UTF8 mode. At the BYTE level, all of those accented characters will appear as two units (bytes), but when you are working in UTF8 mode, they are rendered as a single character.

Unfortunately, here as in other environments, there is a problem with WHERE the data comes from and HOW it gets into a row. UTF8 does its best to deal with improperly formed strings and the results can be very interesting. And, just given a random string of bytes, there is no way that a developer can always say with 100% certainty that a string is ANSI or UTF8.

If you DO decide to use UTF8, then the data must be converted so that the single byte accented (char 128+) will be in the proper encoding.

Alexander Holliday

View this thread on the Works forum...