RList SELECT vs Reduce Performance (OpenInsight 32-Bit) [Revelation On-Line Wiki]

OpenInsight 32-Bit, Enrique Murphy, The Sprezzatura Group, John Bouley, Richard Hunt, dsig at sigafoos.org

Join The Works program to have access to the most current content, and to be able to ask questions and get answers from Revelation staff and the Revelation community

At 14 FEB 2005 09:37:00AM Enrique Murphy wrote:

I have not found specific answers to the questions below.

1. RList SELECT in OI 7+ works different than TCL SELECT in ARev. If you search on two indexed fields, the RList seems to search on both fields and then join both results, unlike ARev SELECT who seemed to search on the first indexed field from left to right, and then on the next, so the second search is applied to the result of the first one. This different way of working forced me to change the way to search in OI. In ARev, I used to build just one SELECT command, and this worked OK. In OI, I found it much more faster to build multiple RList commands and submit them one by one. Each command searchs on just one field, and I order them according to the number of different values each index has. This is: a date field, for example, is always searched first than an field holding 2 or 3 values only. Is this true?. If it is, I have never seen any documentation about this (major) change.

2. The second question is about Reduce. I have never used Reduce for searching, but I have seen many posts telling that Reduce should be faster than RList. I have one database with 3 million (and growing) records. In some cases, I have to search and sort the database before processing the result, and in other cases, just search and process (ordering does not matter). What I mean with search and sort before processing is search and sort before starting the Loop..ReadNext..Repeat to exhaust the result list. In the case of searching and sorting before processing RList takes a long time (half an hour to select and sort 75000 records out of 3 million). I would like to know if Reduce would really improve RList performance at least in the case of searching and processing (without ordering).

Enrique Murphy

At 14 FEB 2005 10:18AM The Sprezzatura Group wrote:

There are differences between the parsers in AREV and OI, but the select logic is basically similar.

All OpenInsight
RLIST( SELECT...)
calls and all AREV
PERFORM SELECT...
calls are just shells around the REDUCE subroutine.

The difference is that the shell commands resolves the records. This means that the code does a readnext through each returned record so it can write the results to the LISTS or SYSLISTS table. When the system writes to these tables, it must also delete the earliest stored entry in the list.

This is the difference in times, which would be more pronounced for non-indexed fields.

As for why OI takes longer, we'll need the version of AREV to run our own timing checks.

Generally, though, we've found that if the results required are all indexed fields, a BTREE.EXTRACT is usually the quickest way to return the key lists.

The Sprezzatura Group

World Leaders in all things RevSoft

At 14 FEB 2005 10:43AM John Bouley wrote:

I also have experienced differences between OI and AREV. One major problem with OI was the fact that on occasion we saw problems when it returning bad results (all rows in the table). This was solved by breaking the select into multiple statements. However, the real reason tunred out to be one of the indexed fields was corrupted! And instead of returning an error and no results it retunred all rows.

In AREV it showed an error and returned no results.

Again, this was in 4.13 and I think in 7.01. Not sure about 7.1 because the problem only manifested istelf when there was a corrupt index so it is hard to recreate.

I believe there are other differences in how it parses the Anding/Oring. In OI you must use the parentesis more often than AREV.

Just my 2 cents.

At 14 FEB 2005 02:11PM Richard Hunt wrote:

I believe that using indexing will greatly reduce the select processing time. Although I would not suggest or even use BTREE… I would use CROSS REFERENCING.

If you are selecting for a date or a range of dates the indexing should retrieve the rows within the select criteria within a second. Although if you are trying to do very complicated selects or indexing… well… good luck.

Try to think the select process thru. The goal is to be able to use indexing to reduce the initial number of rows selected. Then you might want to select another time with the minimal row count. Once you have the minimal row count then do a sort.

I also do not like the standard indexing offered by OI. They are way to ohhhh… "busy complex", and I have heard alot about corruption problems. A simple cross referencing of record (row) ids would do and also would be quite fast. So fast that you would not need seperate indexing work stations. The index update would happen during the RTP (MFS) write proceedure. I built mine and tested it. You don't even know the indexing is happening during the "WRITE". And I am talking about over 1 million record (row) files (tables).

BTREE indexing… unless you desperately need it… I would definately stay away from it.

At 14 FEB 2005 04:09PM dsig _at_ sigafoos.org wrote:

'BTREE indexing… unless you desperately need it… I would definately stay away from it.'

Could you please send some facts about problems as we have been using btree indexing since its inception and although a few hickups have happened .. btrees are wonderful.

Thanks .. looking forward to the info

dsig _at_ sigafoos.org.com onmouseover=window.status=the new revelation technology .. a refreshing change;return(true)"

David Tod Sigafoos ~ SigSolutions

At 14 FEB 2005 04:37PM John Bouley wrote:

I would also second the motion. We use Btrees extensively. As a "mater of fact" a Cross-reference is nothing more than a Btree on a symbolic that breaks the field into separate words. So actually putting a Cross-Reference on a Date field would be counter productive.

As far as writting your own logic that is entirely up to you but in my opinion it is like re-inventing the wheel.

John

At 14 FEB 2005 05:16PM Richard Hunt wrote:

I am clearly stating an opinion I personally have. I also consider it to be constructive not destructive.

I do not like BTREE indexing due just a few listed following items…

1) The dictionary definition for the key field must be at least as wide as any key field text plus any delimiters (for mv's).

2) It seems to convert character strings to numeric strings if possible.

3) It seems to remove leading zeros.

4) I read alot about corruption and rebuilding.

I have an unwillingness to verify and determine the maximum string length of the key. And then if the maximum becomes greater for any reason, you have to rebuild the index. It kinda rubs the wrong way when the product's main issue is "variable length, multi value" database.

I just rather use a different type of indexing when all I need to do is select the 50 items that are dated today out of the 1,000,000 items. BTREE indexing is not efficient (way too much overhead) for that type of demand. A CROSS REFERENCE (not the BTREE cross reference) indexing kinda like a record containing fields of row ids and the key being the date. BTREE seems to do a ton of extra other things. Also it appears to me that there is only one index table for each data table. So if you have several indexes for your one data table, then the one index table is quite overloaded. That could be why the indexes seem to corrupt quite a bit.

I suppose if you need BTREE indexing because of its way of indexing and etc. then it would be a good thing to use. Although if you are looking for a very simple indexing, like noted above, BTREE indexing is, in my opinion, quiet too much and too burdening on the system.

In short… why use something that takes a long time to build or update when something else can do it alot faster and more reliable.

Inventing a wheel, sometimes is not so hard. In fact building a indexing MFS was rather easy.

Currently I do find that the "indexing" and the "select" processes are the most "buggiest".

At 15 FEB 2005 09:39AM Enrique Murphy wrote:

Thanks for the answer (I can't see the time of it to compare it to the other, but I think it was fast enough). I have read several posts saying that RList Select is a shell over Reduce, so from this should be true that using Reduce may increase the overall processing time (Select / ReadNext).
I think a good approach would then be something like this:
1.1 Test what fields are indexed and use BTree.Extract on them first.
1.2 Loop through the rest of the fields to finalize the search and/or sort
2. Use Reduce if none of the fields are indexed. In this case, I have experienced an error in ReadNext..by when the Reduce and Select..by commands are done in a called function (this is, not in the same stored procedure that readnexts the rows). This was so in OI 7.01, I don't know if it has been corrected.
Enrique

At 15 FEB 2005 09:43AM The Sprezzatura Group wrote:

Since reduce is a subroutine (REDUCE), select is a subroutine (RTP12) and readnext is a subroutine (RTP11 or RTP11.INDEX), then calling them from your own subroutine would not be an issue. Usually, is the cursor values that cause problems when working with these functions.

The Sprezzatura Group

World Leaders in all things RevSoft

At 15 FEB 2005 10:03AM Enrique Murphy wrote:

Richard:

I did not understand completely the way you are managing XREFs. You said you don't use the Btree on the simbolic field?. So you created a secondary table with the dates as the key to store the keys of the main table?. Certainly, the Revelation world is some kind of paradise for crazy programmers with crazy ideas! (Me included!) :)

I never care about the length of keys or fields before building a Btree, but I never had a problem because of that. Did you?

I also had problems of corrupted indexes, but until now never such a headache to stay away from them. All I want is to get better performance in OI than in ARev.

Thanks anyway.

Enrique

At 15 FEB 2005 10:15AM Enrique Murphy wrote:

The sequence that gave me an error was:

1. Open a file in Subroutine A

2. Call Function B, which calls Reduce and Select..By and returns the cursor to Subroutine A.

3. ReadNext..By using the cursor returned in Subroutine A gave me an error.

I remember that opening the file after the call to Funcion B worked, but before did not work.

Do Reduce or Select..by modify the file handle?. What may be wrong with the cursor?.

At 15 FEB 2005 10:31AM The Sprezzatura Group wrote:

From what you're saying, there's nothing obviously wrong with the logic.

Do you recall what the error was?

The Sprezzatura Group

World Leaders in all things RevSoft

At 15 FEB 2005 10:46AM dsig _at_ sigafoos.org wrote:

' .. As a "mater of fact" a Cross-reference is nothing more than a Btree on a symbolic that breaks the field into separate words .. '

i wasn't going to mention that .. DOH!

dsig _at_ sigafoos.org.com onmouseover=window.status=the new revelation technology .. a refreshing change;return(true)"

David Tod Sigafoos ~ SigSolutions

View this thread on the Works forum...