{{tag>category:"OpenInsight 32-Bit" author:"Matthew Crozier" author:"Pat McNerthney"}}

[[https://www.revelation.com/the-works|Join The Works program to have access to the most current content, and to be able to ask questions and get answers from Revelation staff and the Revelation community]]


==== How to tell if a string UTF8 encoded? (OpenInsight 32-Bit) ====
=== At 17 JAN 2005 08:34:39PM Matthew Crozier wrote: ===
 
 
<QUOTE>
We're looking at converting our application to UTF8 mode and are investigating what's involved in data conversion of diacritical characters in existing data.  It seems like a process of just applying the ANSI_UTF8 function to each record. But I notice that this function should only be applied once - the string will corrupt if it is already UTF8 encoded.


Eg	ANSI_UTF8( \A9\) returns \C2A9\

but	ANSI_UTF8( \C2A9\) returns \C382C2A9\


So is there any way to tell if a string is already ANSI or UTF8 encoded?


OI 7 itself seems to be able to do this.  If you bring up an ANSI record in a window, OI converts it to UTF8 in the edit controls.  The data in these controls is written as UTF8 when saved.  But it doen't reapply the conversion on the UTF8 data the second time the record is read into the window.  How does it know??


Any help or futher tips appreciated.

Cheers, M@

[url=http://www.vernonsystems.com][img]http://www.vernonsystems.com/images/logo_main.gif[/img][/url]
</QUOTE>
----

=== At 17 JAN 2005 09:14PM Pat McNerthney wrote:  ===

<QUOTE>"So is there any way to tell if a string is already ANSI or UTF8 encoded?"


No.


"OI 7 itself seems to be able to do this...How does it know??"


It doesn't, it is assuming the string is UTF8.


In UTF8 mode, OI converts the passed in string from UTF8 to 16-bit Unicode, which is what the Windows control wants.  When the control is done, OI converts the string from 16-bit Unicode back to UTF8.


During the converstion from UTF8 to 16-bit Unicode, if OI finds a bad UTF8 multi-byte character sequence, it will process the individual bytes as individual 16-bit characters.


So one way you could replicate this is to pass in your string through the following:


Utf8String=UNICODE_UTF8(UTF8_UNICODE(AnsiString))


However, this doesn't convert those ANSI strings that just happen to have a valid multi-byte UTF8 character sequence in it.


Pat

</QUOTE>

----

=== At 18 JAN 2005 04:20PM Matthew Crozier wrote:  ===

<QUOTE>Thanks for explaining that up, Pat.  I can see what's going on now.

Cheers, M@

[url=http://www.vernonsystems.com][img]http://www.vernonsystems.com/images/logo_main.gif[/img][/url]

</QUOTE>


[[https://www.revelation.com/revweb/oecgi4p.php/O4W_HANDOFF?DESTN=O4W_RUN_FORM&INQID=WORKS_READ&SUMMARY=1&KEY=BBD67719638BB21D85256F8D0008AA78|View this thread on the Works forum...]]