Large file import (OpenInsight 32-Bit) [Revelation On-Line Wiki]

OpenInsight 32-Bit, Colin Rule, //www.sprezzatura.com]The Sprezzatura Group[/url], Mike Ruane, //www.sprezzatura.com]The Sprezzatura Group[/url], Stefano Cavaglieri, Richard Hunt, Bob Carten, Bruce Cameron, Warren Auyong, Sean FitzSimons

Join The Works program to have access to the most current content, and to be able to ask questions and get answers from Revelation staff and the Revelation community

At 05 SEP 2008 04:34:36AM Colin Rule wrote:

I am importing 515,000 records from a tab delimited file, 150Mb.

I am using the 'Remove LINE From DATA AT position Setting flag' facility to extract the data quickly, create a record with about 20 fields and write it to the database.

I ran the import last night, started at 9:00pm, still running at 8:00am the next day, 60% complete.

There is little left I can do optimise the code, but I have made a few small changes today, and will try again.

I am running locally on a PC, not connected to a server, so no interface with the LH drivers etc, and my PC is reasonably quick.

Does anyone have any suggestions as to ways to make this run faster, at least so that it can be run in a timeframe that does not involve a weekend.

Thanks

Colin

At 05 SEP 2008 06:34AM [url=http://www.sprezzatura.com]The Sprezzatura Group[/url] wrote:

Post the code ;)

The Sprezzatura Group

World leaders in all things RevSoft

At 05 SEP 2008 07:18AM Colin Rule wrote:

Code is below, with bits that are irrelevant stripped out.

CONVERT @LOWER_CASE TO @UPPER_CASE IN FILE

OSOPEN FILE TO DOSFILE ELSE RETURN

CALL UTILITY("CURSOR","H")

OSREAD DATA FROM DOSFILE ELSE RETURN

OSCLOSE DOSFILE

SWAP CHAR(13) WITH @FM IN DATA

SWAP CHAR(10) WITH "" IN DATA

Position=1

MAX_LINES=.MAX_LINES-]TEXT

NO_LINES=COUNT(DATA,@FM)+1

CLEARFILE RATE_BUILDUP ELSE NULL

CLEARFILE RATE_INDEX ELSE NULL

FOR I=2 TO NO_LINES

	Remove LINE From DATA AT position Setting flag

	SWAP CHAR(9) WITH @FM IN LINE

	ROW="

	FOR C=1 TO NO_COLS

		COL=COLUMNS

		ROW=LINE

	NEXT C

	RATE_CODE	=TRIM(ROW)

	SUPP_CODE	=TRIM(ROW)

	PROD_MAJOR	=TRIM(ROW)

	PROD_MINOR	=TRIM(ROW)

	COMM_MAJOR	=TRIM(ROW)

	COMM_MINOR	=TRIM(ROW)

	BRAND_NAME	=TRIM(ROW)

	PRODUCT		=TRIM(ROW)

	TYPE		=TRIM(ROW)

	DIMS		=TRIM(ROW)

	MATERIAL	=TRIM(ROW)

	PRICEQ		=TRIM(ROW)

	UNIT		=TRIM(ROW)

	RATE		=TRIM(ROW)

	INSTALL		=TRIM(ROW)

	IF NOT(NUM(INSTALL)) THEN INSTALL="

	DESC=TRIM(BRAND_NAME:" ":PRODUCT:" ":TYPE:" ":DIMS:" ":MATERIAL)

	IF RATE_CODE THEN GOSUB WRITE_RATE

	CALL YIELD()

NEXT I

TIME2=TIMEDATE()

MSG="

MSG=Items import=:NO_LINES

MSG=Import start time=:TIME1

MSG=Import finish time=:TIME2

CALL BUILD_MSG(@WINDOW,MSG,"I")

RETURN

WRITE_RATE:

CONVERT @LOWER_CASE TO @UPPER_CASE IN UNIT

IF NOT(NUM(PRICEQ)) THEN RETURN

IF PRICEQ THEN RATE=RATE/PRICEQ

ITEM="

ITEM=DESC

ITEM=M"

ITEM=UNIT

ITEM=ICONV(RATE,"MD4")

ITEM=DATE

ITEM="				;ITEM="

ITEM=DESC				;ITEM=Installation time"

ITEM=1				;ITEM=INSTALL

ITEM=10000			;ITEM=INSTALL*10000

ITEM=UNIT				;ITEM=Hours"

ITEM=ICONV(RATE,"MD4");ITEM="

ITEM=SUPP_CODE

ITEM=PROD_MAJOR

ITEM=PROD_MINOR

ITEM=COMM_MAJOR

ITEM=COMM_MINOR

WRITE ITEM ON RATE_BUILDUP,RATE_CODE ELSE NULL

RETUR

At 05 SEP 2008 07:21AM Mike Ruane wrote:

Colin-

If this is a new file, when you make it, presize it.

If it it not a new file, make a new one, presize it, copy the existing records in, and then import.

If there is indexing on the table, take it off, import, put it back on.

Hope it helps-

Mike

At 05 SEP 2008 07:48AM [url=http://www.sprezzatura.com]The Sprezzatura Group[/url] wrote:

In addition to Mike's comments a few minor coding points spring straight to mind from an optimization POV:

1) Consider using convert instead of swap. It's faster. e.g

Instead of
SWAP CHAR(13) WITH @FM IN DATA
SWAP CHAR(10) WITH "" IN DATA
Do:
 convert \0D0A\ to \FE\ in data
Likewise for
 SWAP CHAR(9) WITH @FM IN LINE
do
 convert \09\ to \FE\ in line
Use embedded characters with the \xx\ notation rather than the char() function, because char() is an opcode and therefore has overhead

2) Consider yield() less - just call it every 100 iterations for example …

3) Do you need to put each item from row into a separate variable? If there's no processing to do on a field in ROW then there's little point putting it into another variable before transferring it into the ITEM record. Just more overhead (though it does make the code more readable, but then equates would help in that situation …)

The Sprezzatura Group

World leaders in all things RevSoft

At 05 SEP 2008 08:37AM Stefano Cavaglieri wrote:

Colin -

Does your OI application run in UTF8 character mode? If yes, this is a know issue (at least I know it is), consider a for/next loop instead of remove/from/at.

- Stefano

At 05 SEP 2008 12:57PM Richard Hunt wrote:

A few more efficient comments…

1) The SWAP CHAR(9) WITH @FM sentence should be done before the REMOVE sentence loops. Use the sentence CONVERT CHAR(9) TO @VM IN LINE Using @VM will cause you to modify other sentences in your program.

2) The TRIM function… Use the TRIM function once right after the REMOVE sentence. The TRIM function is a multivalue function.

3) When combining values within a field try using only one sentence. What I mean is for sentence RECORD=LINE and RECORD=LINE replace those sentences with RECORD=LINE:@VM:LINE

4) Is it possible the the file you are importing to has indexes???

At 05 SEP 2008 02:13PM Bob Carten wrote:

Colin - Others have made the most valuable the suggestions, especially the suggestion to presize the files and sizelock them.

I made a quick pass with a couple of other optimizations

* pull everything possible out of the main loop

* don't bother with swaps, just use char(13) and char(9) as delimiters

* store intermediate values in dimensioned arrays rather than dynamic arrays

* I did not change the Write part, but you could use dimensioned arrays, matwrite there.

* note - I just typed this out, not tested

- Bob

CONVERT @LOWER_CASE TO @UPPER_CASE IN FILE

OSOPEN FILE TO DOSFILE ELSE RETURN

CALL UTILITY("CURSOR","H")

OSREAD DATA FROM DOSFILE ELSE RETURN

OSCLOSE DOSFILE

equ cr$ to \0D\

equ tab$ to \09\

NO_LINES=COUNT(DATA,@FM)+1

* Sizelock before clear

equ set_sizelock$ to 5

call Fix_LH(RATE_BUILDUP, set_sizelock$, 0,1)

call Fix_LH(RATE_INDEX, set_sizelock$, 0,1)

CLEARFILE RATE_BUILDUP ELSE NULL

CLEARFILE RATE_INDEX ELSE NULL

* Use matrices instead of dynamic array for intermediate data

DIM ROW(NO_COLS)

DIM LINE(NO_COLS)

DIM MAP(NO_COLS)

BLANK="

MATPARSE BLANK INTO ROW

MATPARSE BLANK INTO LINE

MATPARSE BLANK INTO MAP

* Use equates to avoid intermediate variables

equ RATE_CODE to ROW(1)

equ SUPP_CODE to ROW(2)

equ PROD_MAJOR to ROW(3)

equ PROD_MINOR to ROW(4)

equ COMM_MAJOR to ROW(5)

equ COMM_MINOR to ROW(6)

equ BRAND_NAME to ROW(7)

equ PRODUCT to ROW(8)

equ TYPE to ROW(9)

equ DIMS to ROW(10)

equ MATERIAL to ROW(11)

equ PRICEQ to ROW(12)

equ UNIT to ROW(13)

equ RATE to ROW(14)

equ INSTALL to ROW(15)

* Build map outside of loop

FOR C=1 TO NO_COLS
 MAP(C)=COLUMNS
NEXT C

* Main loop

FOR I=2 TO NO_LINES

parse on line feeds and tabs, don't need the swaps
 work=dataipos,lf$
 ipos=col2()+2 ; * Skip over lf$
 matparse work into line using tab$
Mapping done using dimmed arrays
 FOR C=1 TO NO_COLS
    COL=map(C)
    ROW(C)=trim(LINE(COL))
 NEXT C   
 IF NOT(NUM(INSTALL)) THEN INSTALL="
 DESC=TRIM(BRAND_NAME:" ":PRODUCT:" ":TYPE:" ":DIMS:" ":MATERIAL)
 IF RATE_CODE THEN GOSUB WRITE_RATE
 CALL YIELD()
NEXT I

TIME2=TIMEDATE()

MSG="

MSG=Items import=:NO_LINES

MSG=Import start time=:TIME1

MSG=Import finish time=:TIME2

CALL BUILD_MSG(@WINDOW,MSG,"I")

RETURN

WRITE_RATE:

CONVERT @LOWER_CASE TO @UPPER_CASE IN UNIT

IF NOT(NUM(PRICEQ)) THEN RETURN

IF PRICEQ THEN RATE=RATE/PRICEQ

ITEM="

ITEM=DESC

ITEM=M"

ITEM=UNIT

ITEM=ICONV(RATE,"MD4")

ITEM=DATE

ITEM=" ;ITEM="

ITEM=DESC ;ITEM=Installation time"

ITEM=1 ;ITEM=INSTALL

ITEM=10000 ;ITEM=INSTALL*10000

ITEM=UNIT ;ITEM=Hours"

ITEM=ICONV(RATE,"MD4");ITEM="

ITEM=SUPP_CODE

ITEM=PROD_MAJOR

ITEM=PROD_MINOR

ITEM=COMM_MAJOR

ITEM=COMM_MINOR

WRITE ITEM ON RATE_BUILDUP,RATE_CODE ELSE NULL

RETURN

At 05 SEP 2008 02:38PM Bruce Cameron wrote:

Colin,

Not sure if this is faster then what you have as I haven't benchmarked but you could try…

osopen data from dosfile then

convert char(10) to "" in data

convert char(13) to @fm in data

data=delete(data,1,0,0)

totrows=dcount(data,@fm)

for val=1 to totrows

   row=data

delete the row from the variable

   data=delete(data,1,0,0)

Do your stuff

next val

end

Also, you could move your lines of..

 IF RATE_CODE Then

AND

IF NOT(NUM(PRICEQ))

up in your code so that if they don't pass you don't have

to read and set any variables. EG.

 Row=Trim(Row)

 RATE_CODE =ROW

 IF RATE_CODE Then

    SUPP_CODE  =ROW

    PROD_MAJOR =ROW

    PROD_MINOR =ROW

    COMM_MAJOR =ROW

    COMM_MINOR =ROW

    BRAND_NAME =ROW

    PRODUCT    =ROW

    TYPE       =ROW

    DIMS       =ROW

    MATERIAL   =ROW

    PRICEQ     =ROW

    IF NOT(NUM(PRICEQ)) Else

       UNIT       =ROW

       RATE       =ROW

       INSTALL    =ROW

       IF NOT(NUM(INSTALL)) THEN INSTALL="

       DESC=TRIM(BRAND_NAME:" ":PRODUCT:" ":TYPE:" ":DIMS:" ":MATERIAL)

       CONVERT @LOWER_CASE TO @UPPER_CASE IN UNIT

       IF PRICEQ THEN RATE=RATE/PRICEQ

          ITEM="

          ITEM=DESC

          ITEM=M"

          ITEM=UNIT

          ITEM=ICONV(RATE,"MD4")

          ITEM=DATE

          ITEM=DESC:@VM:"Installation time"

          ITEM=1:@VM:INSTALL

          ITEM=10000:@VM:(INSTALL*10000)

          ITEM=UNIT:@VM:"Hours"

          ITEM=ICONV(RATE,"MD4")

          ITEM=SUPP_CODE

          ITEM=PROD_MAJOR

          ITEM=PROD_MINOR

          ITEM=COMM_MAJOR

          ITEM=COMM_MINOR

          WRITE ITEM ON RATE_BUILDUP,RATE_CODE ELSE NULL

End

End

Food for thought. BC

At 05 SEP 2008 07:54PM Colin Rule wrote:

No UTF8 is not being used.

Thanks anyway, not that I have any ieda what this does.

Colin

At 07 SEP 2008 01:36AM Warren Auyong wrote:

Isn't the bracket operator and col2() faster than a remove operator?

So why bother with the converts or swaps:

Equ CR$ to \0D\

Equ TAB$ to \09\

pos=1

header=data1,CR$

pos=col2() + 2 ;*(skip crlf)

Loop

line=datapos,CR$

until line eq ''

pos=col2() + 2

RATE_CODE=line1,TAB$

SUPP_CODE=linecol2()+1,TAB$

PROD_MAJOR=linecol2()+1,TAB$

PROD_MINOR=linecol2()+1,TAB$

COMM_MAJOR=linecol2()+1,TAB$

COMM_MINOR=linecol2()+1,TAB$

BRAND_NAME=linecol2()+1,TAB$

PRODUCT=linecol2()+1,TAB$

TYPE=linecol2()+1,TAB$

DIMS=linecol2()+1,TAB$

MATERIAL=linecol2()+1,TAB$

PRICEQ=linecol2()+1,TAB$

UNIT=linecol2()+1,TAB$

RATE=linecol2()+1,TAB$

INSTALL=linecol2()+1,TAB$

* build output record and trim it before write

repeat

At 07 SEP 2008 07:33PM Colin Rule wrote:

Thanks people.

I have implemented most of the changes, and each one seems to shave a millisecond or two, which over 500,000 records, all add up. Finished the import after 18 hours, so I will be trying a few more to see what has the biggest effect.

Colin

At 08 SEP 2008 06:07AM Colin Rule wrote:

Thanks Mike,

I have used a separate temp file to try the presize and it certainly makes the biggest difference, but…

I have the Dicts in the DATA folder of the Applicaiton, and the DATA files reside in one of many folders, one for each Project. When I try and delete a table in Database manager, it trashes the DICT for that table.

How can you delete the DATA portion only, and re-create the DATA only, and leave the Dictionaries alone… using the Database Manager/Table Builder?

Colin

At 08 SEP 2008 08:21AM Sean FitzSimons wrote:

Colin,

You will need to issue a CLEAR_TABLE command. There isn't a hook to this subroutine in the Database Manager/Table Builder at this point in time.

Sean

View this thread on the Works forum...