FS1001 on Write (AREV Specific)
At 12 JUL 2007 05:45:29PM David Craig wrote:
Yet another problem with this project, I'm getting an error message "fatal error writing 700006555 in table (tablename)".
"The maximum size of the key exceeds the maximum variable length."
From searching this forum I read that I should look at framesize and sizelock. Dump tells me that framesize is 1024 and sizelock is 0. I read that making framesize 256 might make a difference. I also notice that modulo is 1 for all frames although I'm not sure exactly what this means. I'm digging through the documentation trying to figure out if I would be better off changing the framesize, or changing the size of the key and then making it fit the specs (8 digits starting at 700000000) when the data is output - that will bring up other complications but if that is what has to happen so be it.
As always, if anyone knows the answer to this, help is gratefully accepted.
David Craig.
At 13 JUL 2007 10:08AM Victor Engel wrote:
It looks like you are using numeric keys. Files with numeric keys have trouble resizing from a modulo of 1 to a modulo of 2. This is because when the modulo is 2, the keys generally all hash to the first group, so no resizing actually gets done. There are several possible solutions.
* Presize the file so that you start out with a file with a larger modulo.
* Change the threshold to something below 50% temporarily. For an existing file in the condition you describe, this should force resizing to a modulo greater than 2, thus alleviating the problem during subsequent operations.
* Remake the file using an appropriate modulo.
* Rethink the key structure and use something other than numeric.
* Copy records from your problem file to a temporary file with the (D) option. Temporarily copy records from some other file to your problem file whose keys are structured differently. This should trigger a resize. Delete these temporary records. Copy your originally deleted records back.
In general, I've found this problem occurs mainly when the modulo is 1 or 2. If you can somehow get the modulo higher than 2 you probably will not run into the situation anymore.
I would not change the frame size from 1024. That has nothing to do with your problem.
At 13 JUL 2007 10:25AM Victor Engel wrote:
Of course, the sizelock must be 0 or 1 for the steps I said to work.
At 13 JUL 2007 12:17PM Warren Auyong wrote:
Long sequential numeric keys have a habit of clustering within groups rather than having a flatter distribution within the groups. This is due to the hashing algorithm and not much can be done about this other than changing the record keys. Other Pick flavors most notably Universe have different hashing algorithms you can specify when creating the file.
Presizing the modulo will overcome the modulo 1 resize flaw but will not affect clustering.
This has been discussed before.
At 13 JUL 2007 02:49PM David Craig wrote:
I've remade the table with a large number of rows which changed the modulo and then 'write' worked successfully. But then when I tried to clear it to retest it blew up in several interesting ways, so I deleted, then recreated the table with the larger number of rows and now I'm retesting it.
I'll do a search on the key issue, unfortunately the purpose of this project is to generate keys which will be unique across multiple systems so the whole reason for the table is to generate the numeric key. Would creating a compound key get around this? Unfortunately, the other fields in the table are either dates or other integer keys, so they're all basically integers.
I'll read up and post back if I have any questions, thanks again for your time and patience;
David C.
At 13 JUL 2007 03:14PM Victor Engel wrote:
If your sizelock is zero when you do a clearfile, the modulo gets reset to 1. You should make sure the sizelock on this file is 1 (or 2 if you don't want it to resize at all). With a sizelock of 1, the file can expand but not contract.
If I were in your position, just starting out on such a project, I'd figure some way to make a one-to-one mapping from the numeric scheme you've been given to an alphanumeric one. Changing the last digit N to the Nth letter in the alphabet is likely sufficient to get around this problem. Perhaps there's a better scheme that more closely fits your data.
If you don't mind the wasted space, you can also probably get around the problem by simply lowering the threshold far enough. How far you need to lower it depends upon how bad the clumping is.
You can make this transparent to the users by writing an MFS to do the work if you want.
At 13 JUL 2007 06:40PM David Craig wrote:
Modulo was reset to 1 when I recreated the table as you said, however because I set the number of records to 500k this time after the first test (added about 10k records) modulo went to somewhere around 650. Then on the second test which added another 25k records modulo is now 1727. So I think I've made it past this problem, I can hardly wait to see what's next!
I really want to avoid fiddling with the key if I can avoid it, although I appreciate your advice - there is a huge amount of complexity outside of what I'm dealing with/writing about here and I am under a time crunch (as always) so as long as it's not likely to occur again and btree.extract/index.flush are doing their jobs successfully, and there's no reason to think they won't as the file gets larger (please feel free to contradict me here if you think differently) then I've got to move on to the next piece of the puzzle.
I did consider making the threshold smaller but I think it's working reasonably well now. If I understand it, a lower threshold would make access quicker at the expense of increased write overhead - would that be correct? Storage is cheap these days so that's not a concern. Developer time is always in demand so that is.
Many thanks for your (and Warren's) help;
David C.
At 14 JUL 2007 06:28AM Warren Auyong wrote:
Do not use "E"/"e" at the end of a numeric key else ARev will think it is scientific notation. I made that mistake once.
At 16 JUL 2007 11:28AM Victor Engel wrote:
:I did consider making the threshold smaller but I think it's working reasonably well now. If I understand it, a lower threshold would make access quicker at the expense of increased write overhead - would that be correct? Storage is cheap these days so that's not a concern. Developer time is always in demand so that is.
Not exactly. Think of the threshold value as a gauge of the contents of the .LK file. By setting it to 80% (the default), you're telling the system to resize the file when the .LK file gets 80% full. Since files with numeric keys tend to have a problem resizing to a modulo of 2 if they have any problem at all, that is why I suggested trying a threshold of 40%. This way, if the first frame fills up, it will force a modulo of 3 rather than 2, and in my experience, once it hits 3, there is usually not a problem (however, recently I had a file that hit a problem with clumping for the first time at a modulo of 12000 or so).
Will there be more write overhead? Maybe, or maybe not. There will be more unused space in the .LK file. On the other hand, because the records are more sparsely distributed, the .OV file will be smaller. And it's the .OV file activity that is the most expensive in terms of total I/O. It's the file that tends to become fragmented.
There's a balance somewhere where efficiency is the greatest. On average, it's probably pretty close to 80% threshold, but I suspect it might be more efficient with a smaller threshold for files with numeric keys.
At 16 JUL 2007 11:29AM Victor Engel wrote:
Good point. Thanks for reminding me.