Join The Works program to have access to the most current content, and to be able to ask questions and get answers from Revelation staff and the Revelation community

At 06 DEC 2004 04:12:22PM Paxton Scott wrote:

I am importing a 2046 record ascii file, stuffing into LH table.

On occasion, the data in the ascii record should be added to a set of mv's in the lh table. so, I update indexes after each write and do a btree.extract to check for an existing record into which to stuff the current ascii record's data. The lookup is against a single symbolic indexed field (composed of three real fields).

If more than one key is returned when i do the btree.extract, I trap the original ascii record, the keys returned, the search string and the flag. Then, go on the the next record and at completion, write out the 'errors'.

2042 records process fine, but 4 cause multiple keys to be returned.

After 289 records processed, btree.extract retuns 105 keys so I skip this record and go on. after 968, 499 keys returned, after 3174, 216 keys returned and after 1595, 51 keys returned. Process continues to completion - 2046 records.

The value in the search string is long. Example (search field|value)

TITLEID|02-2PAC - Thug Mansion~1.mp35936879MCPLZHXE7CAHBLRVPYNQ7AR3E5FFR7WO

Other longer ones process successfully (return none or one key)

Although the search strings have some 'odd characters' – spaces ~ ^ . and so, I see no pattern. Anyone ever seen anything like this or have any ideas, comments?

arcs@arcscustomsoftware.com

[url=http://www.arcscustomsoftware.com]ARCS, Inc.


At 07 DEC 2004 08:14AM The Sprezzatura Group wrote:

What are in the keys and data fields that are being returned?

Is this actually how the data is formed?

The Sprezzatura Group

World Leaders in all things RevSoft


At 07 DEC 2004 11:14AM Paxton Scott wrote:

Thank you for the response…

Well, the data IS ugly (from a human point of view), but its just a string of characters. The keys (sequential) are normal. I thought perhaps I was bumping up against some sort of limit of the number of characters of the indexed data or value, but it seems bigger (longer) ones work ok. However, clearly the data do not match, and I have not yet found a pattern to the keys returned or how the data are related. But, by trapping the data that returns more than one key, and writing out the surrounding information (original record, search string, keys returned, etc) to an OS file for later analysis and continuing, I have loaded over 9000 records with only 18 that cause this problem, which I skipped. (This was from 2 input files about 230kb and 360kb)

Since then, in attempting to use this strategy to load a 1,150kb ascii file I have run into a more serious problem. After about 5595 records, it breaks with an index corruption message (to the system monitior) that asks me to rebuild the index and try again. (Rebuild usually does not work, remove and reinstall is more certain :-) And, on retry it breaks again, I think at the same place. I just put enough diagostics in to determin this. Since it stops at this point, and I can't tell there is a problem until after it happens, it is hard to work around.

My plan is to remove the input record where this happens and see if it goes further.

Would more details or copies of some of the ASCII input records or the LH records be of help?

Any work-around ideas?

Thanks you again.

arcs@arcscustomsoftware.com

[url=http://www.arcscustomsoftware.com]ARCS, Inc.


At 07 DEC 2004 11:43AM Paxton Scott wrote:

Well, I removed the record which produced the search string which caused the error saying the indexes were corrupt and to rebuild and try again.

TITLEID is my field name and the value is shown below.

TITLEID=ÿzcan Deniz - Askimin Son Hanesi.mp310596696FTYNQ4K53QMCUDLEQL3H6BRU2HW5YVW

Notice the first character (ÿ). I'm guessing that somehow something choked on it. With that record removed, the import ran to completion (10153 writes to the lh file)

Is this a case where I need to learn more about UTF-8?

arcs@arcscustomsoftware.com

[url=http://www.arcscustomsoftware.com]ARCS, Inc.


At 07 DEC 2004 11:54AM Paxton Scott wrote:

Apparently it is only when ÿ is in the first position is there a problem, so it may be something else. ????

arcs@arcscustomsoftware.com

[url=http://www.arcscustomsoftware.com]ARCS, Inc.


At 07 DEC 2004 12:45PM support@sprezzatura.com wrote:

That char looks like a system delimiter - can you check what it is?

support@sprezzatura.com

The Sprezzatura Group Web Site

World Leaders in all things RevSoft


At 07 DEC 2004 12:50PM Paxton Scott wrote:

Bingo! I bet you are right….0xFF

I guess it'd be a good idea to test my input data for system delimiters!!!

What strategies do people use for handling legitimate characters that are system delimiters?

arcs@arcscustomsoftware.com

[url=http://www.arcscustomsoftware.com]ARCS, Inc.


=== At 07 DEC 2004 02:10PM support@sprezzatura.com wrote: ===

Depends on whether you want to go UTF8 or CHARMAP.

support@sprezzatura.com

The Sprezzatura Group Web Site

World Leaders in all things RevSoft


At 07 DEC 2004 03:51PM Paxton Scott wrote:

And what are the reasons that I might use one or the other?

arcs@arcscustomsoftware.com

[url=http://www.arcscustomsoftware.com]ARCS, Inc.


At 07 DEC 2004 05:13PM Wilhelm Schmitt wrote:

Paxton,

we resolved our problems with system delimiters (like and ú) in xref indexes by replacing the function call in the symbolic index field.

Example: Fieldname =] TEST

Symbolic index field =] TEST_XREF

Instead of CALLing something like XREF({TEST},\202C2D2E2F5C\,"","1") you could use your own swap/convert function, before the indexing MFS takes over.

Hope this helps.

Wilhelm

View this thread on the Works forum...

  • third_party_content/community/commentary/forums_works/1b6f2edb0cb7167d85256f6200747d1e.txt
  • Last modified: 2023/12/30 11:57
  • by 127.0.0.1