Start a Readnext from a specific Group (AREV Specific) [Revelation On-Line Wiki]

Sign up on the Revelation Software website to have access to the most current content, and to be able to ask questions and get answers from the Revelation community

At 11 OCT 2005 05:43:02PM R Johler wrote:

AREV Specific

Is there some way to start a Readnext loop from a specific group number of a file?

We have this very large file, that we remove "old enough" records out of (we call this archiving) every night. It a very large table (~10 million records, near the 2gig limit) we have it size locked and take care to ensure we aren't adding in more records today, than we took out last night.

So our archiving process keeps track of a count of how many records to skip (count starting from last run plus how many it read) in a control file, which it uses to initialize itself each night. We then readnext until we hit this count, with a += statement, and then start examining records for possible archive candidates.

When this initailze count is low archiving runs just fine. When this count gets big, archiving doesn't finish before the dread backup starts and we have to suspend archiving. Then archiving resumes after backup, but it often isn't finished when we want to start tomorrow's processing day.

BUT if we could somehow jump right into the Group we left off with, it would be much faster than readnext-skip 9and1/2 million records to start will 'fresh' candidates.

At 11 OCT 2005 06:15PM [url=http://www.sprezzatura.com]The Sprezzatura Group[/url] wrote:

We'd have to postulate that this were possible - as there are internal variables governing this. Will check a bit further tomorrow when back in the office.

The Sprezzatura Group

World leaders in all things RevSoft

At 12 OCT 2005 11:02AM Matt Sorrell wrote:

Ralph,

What is the possibility of adding an indexed column along the lines of 'ARCHIVE_DATE' or some such that would allow you to perform a BTree search for possible candidates?

[email protected]

Greyhound Lines, Inc.

At 12 OCT 2005 11:09AM Victor Engel wrote:

If it's not possible, it would be straightforward to write your own using OSBREAD to read through the frames and navigate through the group yourself.

At 12 OCT 2005 11:42AM [url=http://www.sprezzatura.com]The Sprezzatura Group[/url] wrote:

Can't really see that being a lot faster than a Select/Readnext surely? The idea is to START at an offset into the file for speed.

The Sprezzatura Group

World leaders in all things RevSoft

At 12 OCT 2005 11:48AM R Johler wrote:

Our logic for "old enough" is based on multiple dates each in several related records. In other words a bit of code, which could be in a symbolic, which would be horrid to index.

After some testing on alternative ways to speed things along, the Readnext the whole file was actually the fastest.

In general non-indexed selects are often slower than Readnexting the whole file in our case, although if we have an indexed field that gets us close we will us that. I will recheck to see if can have a close enough indexed filed in this - thanks.

At 12 OCT 2005 11:53AM R Johler wrote:

I can see how to get to the starting group with an OSBREAD, but I can't figure out how I could go from that starting OSBREAD group into Readnext/reads of the file in rbasic.

What have I missed?

At 12 OCT 2005 12:06PM [url=http://www.sprezzatura.com]The Sprezzatura Group[/url] wrote:

You'd do the whole parse of keys thing yourself from thereon in …

The Sprezzatura Group

World leaders in all things RevSoft

At 12 OCT 2005 12:38PM Richard Hunt wrote:

You could use the OSBREAD to get the first row id in that group. You would need to know how linear hashed files are read and hashed.

Once you have that row id you could use the SELECT / READNEXT and check the READNEXT row id until it matches the OSBREAD row id.

I would suggest to increment the SIZELOCK by 2 before this proceedure and decrementing the SIZELOCK by 2 after this proceedure.

I am really shocked that there is no HASHING function that would be like this…

GROUP=HASH(FILE_VAR,ROW_ID)

At 12 OCT 2005 05:14PM Victor Engel wrote:

Right. To start at group X, start OSBREADING at position (X-1)*framesize. Then use knowledge of file structure to navigate through the group.

At 12 OCT 2005 05:28PM Victor Engel wrote:

You don't need to know the hashing algorithm. You'd need to know the hashing algorithm to DERIVE a group, but in this case, we already know it.

The program would go something like this (just typing on the fly, so please excuse stupid errors):

GROUP=XLATE(Get the next group to start from some saved location)

LOOP

GOSUB ParseKeysInGroup

UNTIL HaltCriteriaMet

GROUP += 1

UNTIL GROUP GT MODULO

REPEAT

Save group for next restart (if GROUP ] MODULO, use 1)

STOP

ParseKeysInGroup:

OSBREAD FRAME FROM LKHANDLE AT (GROUP-1)*FRAMESIZE LENGTH FRAMESIZE

Process keys in frame

Get pointer to next frame (in overflow)

IF anotherframe THEN GOSUB ProcessOverflow

RETURN

ProcessOverflow:

OSBREAD FRAME FROM OVHANDLE AT pointer LENGTH FRAMESIZE

Process keys in frame

Get pionter to next frame

IF anotherframe THEN GOSUB ProcessOverflow

RETURN

I also forgot to mention that if you're flirting with a 2 Gig limit, you'll probably have to do periodic compresses or else your overflow file will keep growing despite deleting records. This is because deleting will create holes. A compress will move the overflow around to eliminate the holes. I have no idea how long that would take on a 2 Gig file – probably a bit longer than selecting the file.

At 14 OCT 2005 08:53AM Hippo wrote:

I didn't check the state of discussion for several days … I agree with Victor, this is safe method how to obtain the data. … The method of data retrieval which can be used even in the case 64KKB limit per group keys is crossed. … The method how to copy records of whole table to another file in this case.

The answer to original question is yes. In my oldest monologs in this discussion I have described MFS … method how I traced SELECT * loop.

… you can either perform MFS/BFS readnexting loop yourself, or better … create MFS to remember last used FMC (it is pointer where the cycle should continue). After interrupt you can run the same algorithm next time, but replace the FMC by last FMC returned (on start) … it will process rest of the file.

Search ALPHA Hippo … in this forum.

Victor's solution is correct … accoding to documantation.

This is answer to your question, but the feature is I supose undocumented.

At 14 OCT 2005 09:52AM R Johler wrote:

Thanks to all! Sounds like a good feature for OI to me.

I have never used OSBREAD on an Arev file that is under the control of the Netware NLM 5.5 - any cavets there? Seems like a read wouldn't be an issue…

Thanks for the caution on the overflow size. It did blow up several years ago on July 4th - kind of an Arev fireworks show. So now we constantly monitor the overflow state, and it doesn't not grow as long as each night we archive at least as many rows as we add in the next day.

At 14 OCT 2005 10:04AM R Johler wrote:

Also FYI we never compress this file (call it ORDERS). It is so large it takes too long.

We do this instead:

1. Make a new table in Arev - ORDERS_NEW. RE-Size it to maximum size, sizelock +2. No index on ORDER_NEW no dictionary either.

2. Run an r/basic program that readnext/reads ORDERS and writes each record to ORDERS_NEW, saving off any read or write errors (there should not be any).

3. Stop the LH service.

4. Move ORDERS at the OS level (REV12345.xx) to some backup folder, and copy ORDERS_NEW at the OS level as REV12345.xx into the original data folder.

5. Check OS file permissions and set if needed.

6. Start the LH service.

Naturally no one can be on arev during this process. Doing this with compress (or arev copy commands) takes at least 6 hours. Doing the "behind LH's back" takes about 2 hours.

At 14 OCT 2005 10:17AM Michael Slack wrote:

In reading the responses so far to your question, to me there is some heady dicussions but I'd like to put in my humble suggestion from another point of view.

I read where you said that doing a SELECT was slower than just doing a READNEXT. I was wondering if you combined the SELECT with a SAVELIST, if that might not speed things up over all. I assume starting within your indexed columns, use a series of SELECT statements to help zero in on the list you want to deal with. Then do a SAVELIST. Then do a GETLIST and run your process up until backup time. Save your unused active list. Do backup. Start your process again and pickup where you left off with a GETLIST.

Worst case, you could write the keys you want to archive as the keys to another table (the row part being empty). That would mean some extra proceeing to delete each key after you were done processing it. But you could use the SELECT statement in your archiving process.

Either way, once backup was done and the archiving process started again, you could pickup right were you left off.

I hope this may have helped.

Michael Slack

At 14 OCT 2005 11:17AM Victor Engel wrote:

The only caveat I can think of is that depending on your setup, you might not be able to open the file (if local access is restricted by Netware security). If you can open it, you should be OK.

View this thread on the forum...