Join The Works program to have access to the most current content, and to be able to ask questions and get answers from Revelation staff and the Revelation community

At 05 SEP 2008 04:34:36AM Colin Rule wrote:

I am importing 515,000 records from a tab delimited file, 150Mb.

I am using the 'Remove LINE From DATA AT position Setting flag' facility to extract the data quickly, create a record with about 20 fields and write it to the database.

I ran the import last night, started at 9:00pm, still running at 8:00am the next day, 60% complete.

There is little left I can do optimise the code, but I have made a few small changes today, and will try again.

I am running locally on a PC, not connected to a server, so no interface with the LH drivers etc, and my PC is reasonably quick.

Does anyone have any suggestions as to ways to make this run faster, at least so that it can be run in a timeframe that does not involve a weekend.

Thanks

Colin


At 05 SEP 2008 06:34AM [url=http://www.sprezzatura.com]The Sprezzatura Group[/url] wrote:

Post the code ;)

The Sprezzatura Group

World leaders in all things RevSoft


At 05 SEP 2008 07:18AM Colin Rule wrote:

Code is below, with bits that are irrelevant stripped out.

CONVERT @LOWER_CASE TO @UPPER_CASE IN FILE
OSOPEN FILE TO DOSFILE ELSE RETURN
CALL UTILITY("CURSOR","H")
OSREAD DATA FROM DOSFILE ELSE RETURN
OSCLOSE DOSFILE
SWAP CHAR(13) WITH @FM IN DATA
SWAP CHAR(10) WITH "" IN DATA
Position=1
MAX_LINES=.MAX_LINES-]TEXT
NO_LINES=COUNT(DATA,@FM)+1
CLEARFILE RATE_BUILDUP ELSE NULL
CLEARFILE RATE_INDEX ELSE NULL
FOR I=2 TO NO_LINES
	Remove LINE From DATA AT position Setting flag
	SWAP CHAR(9) WITH @FM IN LINE
	ROW="
	FOR C=1 TO NO_COLS
		COL=COLUMNS
		ROW=LINE
	NEXT C
	RATE_CODE	=TRIM(ROW)
	SUPP_CODE	=TRIM(ROW)
	PROD_MAJOR	=TRIM(ROW)
	PROD_MINOR	=TRIM(ROW)
	COMM_MAJOR	=TRIM(ROW)
	COMM_MINOR	=TRIM(ROW)
	BRAND_NAME	=TRIM(ROW)
	PRODUCT		=TRIM(ROW)
	TYPE		=TRIM(ROW)
	DIMS		=TRIM(ROW)
	MATERIAL	=TRIM(ROW)
	PRICEQ		=TRIM(ROW)
	UNIT		=TRIM(ROW)
	RATE		=TRIM(ROW)
	INSTALL		=TRIM(ROW)
	IF NOT(NUM(INSTALL)) THEN INSTALL="
	DESC=TRIM(BRAND_NAME:" ":PRODUCT:" ":TYPE:" ":DIMS:" ":MATERIAL)
	IF RATE_CODE THEN GOSUB WRITE_RATE
	CALL YIELD()
NEXT I
TIME2=TIMEDATE()
MSG="
MSG=Items import=:NO_LINES
MSG=Import start time=:TIME1
MSG=Import finish time=:TIME2
CALL BUILD_MSG(@WINDOW,MSG,"I")

RETURN

WRITE_RATE:

CONVERT @LOWER_CASE TO @UPPER_CASE IN UNIT
IF NOT(NUM(PRICEQ)) THEN RETURN
IF PRICEQ THEN RATE=RATE/PRICEQ
ITEM="
ITEM=DESC
ITEM=M"
ITEM=UNIT
ITEM=ICONV(RATE,"MD4")
ITEM=DATE
ITEM="				;ITEM="
ITEM=DESC				;ITEM=Installation time"
ITEM=1				;ITEM=INSTALL
ITEM=10000			;ITEM=INSTALL*10000
ITEM=UNIT				;ITEM=Hours"
ITEM=ICONV(RATE,"MD4");ITEM="
ITEM=SUPP_CODE
ITEM=PROD_MAJOR
ITEM=PROD_MINOR
ITEM=COMM_MAJOR
ITEM=COMM_MINOR
WRITE ITEM ON RATE_BUILDUP,RATE_CODE ELSE NULL

RETUR


At 05 SEP 2008 07:21AM Mike Ruane wrote:

Colin-

If this is a new file, when you make it, presize it.

If it it not a new file, make a new one, presize it, copy the existing records in, and then import.

If there is indexing on the table, take it off, import, put it back on.

Hope it helps-

Mike


At 05 SEP 2008 07:48AM [url=http://www.sprezzatura.com]The Sprezzatura Group[/url] wrote:

In addition to Mike's comments a few minor coding points spring straight to mind from an optimization POV:

1) Consider using convert instead of swap. It's faster. e.g

Instead of

SWAP CHAR(13) WITH @FM IN DATA
SWAP CHAR(10) WITH "" IN DATA

Do:

 convert \0D0A\ to \FE\ in data

Likewise for

 SWAP CHAR(9) WITH @FM IN LINE

do

 convert \09\ to \FE\ in line

Use embedded characters with the \xx\ notation rather than the char() function, because char() is an opcode and therefore has overhead

2) Consider yield() less - just call it every 100 iterations for example …

3) Do you need to put each item from row into a separate variable? If there's no processing to do on a field in ROW then there's little point putting it into another variable before transferring it into the ITEM record. Just more overhead (though it does make the code more readable, but then equates would help in that situation …)

The Sprezzatura Group

World leaders in all things RevSoft


At 05 SEP 2008 08:37AM Stefano Cavaglieri wrote:

Colin -

Does your OI application run in UTF8 character mode? If yes, this is a know issue (at least I know it is), consider a for/next loop instead of remove/from/at.

- Stefano


At 05 SEP 2008 12:57PM Richard Hunt wrote:

A few more efficient comments…

1) The SWAP CHAR(9) WITH @FM sentence should be done before the REMOVE sentence loops. Use the sentence CONVERT CHAR(9) TO @VM IN LINE Using @VM will cause you to modify other sentences in your program.

2) The TRIM function… Use the TRIM function once right after the REMOVE sentence. The TRIM function is a multivalue function.

3) When combining values within a field try using only one sentence. What I mean is for sentence RECORD=LINE and RECORD=LINE replace those sentences with RECORD=LINE:@VM:LINE

4) Is it possible the the file you are importing to has indexes???


At 05 SEP 2008 02:13PM Bob Carten wrote:

Colin - Others have made the most valuable the suggestions, especially the suggestion to presize the files and sizelock them.

I made a quick pass with a couple of other optimizations

* pull everything possible out of the main loop

* don't bother with swaps, just use char(13) and char(9) as delimiters

* store intermediate values in dimensioned arrays rather than dynamic arrays

* I did not change the Write part, but you could use dimensioned arrays, matwrite there.

* note - I just typed this out, not tested

- Bob

CONVERT @LOWER_CASE TO @UPPER_CASE IN FILE

OSOPEN FILE TO DOSFILE ELSE RETURN

CALL UTILITY("CURSOR","H")

OSREAD DATA FROM DOSFILE ELSE RETURN

OSCLOSE DOSFILE

equ cr$ to \0D\

equ tab$ to \09\

NO_LINES=COUNT(DATA,@FM)+1

* Sizelock before clear

equ set_sizelock$ to 5

call Fix_LH(RATE_BUILDUP, set_sizelock$, 0,1)

call Fix_LH(RATE_INDEX, set_sizelock$, 0,1)

CLEARFILE RATE_BUILDUP ELSE NULL

CLEARFILE RATE_INDEX ELSE NULL

* Use matrices instead of dynamic array for intermediate data

DIM ROW(NO_COLS)

DIM LINE(NO_COLS)

DIM MAP(NO_COLS)

BLANK="

MATPARSE BLANK INTO ROW

MATPARSE BLANK INTO LINE

MATPARSE BLANK INTO MAP

* Use equates to avoid intermediate variables

equ RATE_CODE to ROW(1)

equ SUPP_CODE to ROW(2)

equ PROD_MAJOR to ROW(3)

equ PROD_MINOR to ROW(4)

equ COMM_MAJOR to ROW(5)

equ COMM_MINOR to ROW(6)

equ BRAND_NAME to ROW(7)

equ PRODUCT to ROW(8)

equ TYPE to ROW(9)

equ DIMS to ROW(10)

equ MATERIAL to ROW(11)

equ PRICEQ to ROW(12)

equ UNIT to ROW(13)

equ RATE to ROW(14)

equ INSTALL to ROW(15)

* Build map outside of loop

FOR C=1 TO NO_COLS

 MAP(C)=COLUMNS

NEXT C

* Main loop

FOR I=2 TO NO_LINES

  • parse on line feeds and tabs, don't need the swaps
 work=dataipos,lf$
 ipos=col2()+2 ; * Skip over lf$
 matparse work into line using tab$
  • Mapping done using dimmed arrays
 FOR C=1 TO NO_COLS
    COL=map(C)
    ROW(C)=trim(LINE(COL))
 NEXT C   
 IF NOT(NUM(INSTALL)) THEN INSTALL="
 DESC=TRIM(BRAND_NAME:" ":PRODUCT:" ":TYPE:" ":DIMS:" ":MATERIAL)
 IF RATE_CODE THEN GOSUB WRITE_RATE
 CALL YIELD()

NEXT I

TIME2=TIMEDATE()

MSG="

MSG=Items import=:NO_LINES

MSG=Import start time=:TIME1

MSG=Import finish time=:TIME2

CALL BUILD_MSG(@WINDOW,MSG,"I")

RETURN

WRITE_RATE:

CONVERT @LOWER_CASE TO @UPPER_CASE IN UNIT

IF NOT(NUM(PRICEQ)) THEN RETURN

IF PRICEQ THEN RATE=RATE/PRICEQ

ITEM="

ITEM=DESC

ITEM=M"

ITEM=UNIT

ITEM=ICONV(RATE,"MD4")

ITEM=DATE

ITEM=" ;ITEM="

ITEM=DESC ;ITEM=Installation time"

ITEM=1 ;ITEM=INSTALL

ITEM=10000 ;ITEM=INSTALL*10000

ITEM=UNIT ;ITEM=Hours"

ITEM=ICONV(RATE,"MD4");ITEM="

ITEM=SUPP_CODE

ITEM=PROD_MAJOR

ITEM=PROD_MINOR

ITEM=COMM_MAJOR

ITEM=COMM_MINOR

WRITE ITEM ON RATE_BUILDUP,RATE_CODE ELSE NULL

RETURN


At 05 SEP 2008 02:38PM Bruce Cameron wrote:

Colin,

Not sure if this is faster then what you have as I haven't benchmarked but you could try…

osopen data from dosfile then

convert char(10) to "" in data
convert char(13) to @fm in data
data=delete(data,1,0,0)
totrows=dcount(data,@fm)
for val=1 to totrows
   row=data
  • delete the row from the variable
   data=delete(data,1,0,0)
  • Do your stuff
next val

end

Also, you could move your lines of..

 IF RATE_CODE Then 

AND

IF NOT(NUM(PRICEQ)) 

up in your code so that if they don't pass you don't have

to read and set any variables. EG.

 Row=Trim(Row)
 RATE_CODE =ROW
 IF RATE_CODE Then
    SUPP_CODE  =ROW
    PROD_MAJOR =ROW
    PROD_MINOR =ROW
    COMM_MAJOR =ROW
    COMM_MINOR =ROW
    BRAND_NAME =ROW
    PRODUCT    =ROW
    TYPE       =ROW
    DIMS       =ROW
    MATERIAL   =ROW
    PRICEQ     =ROW
    IF NOT(NUM(PRICEQ)) Else
       UNIT       =ROW
       RATE       =ROW
       INSTALL    =ROW
       IF NOT(NUM(INSTALL)) THEN INSTALL="
       DESC=TRIM(BRAND_NAME:" ":PRODUCT:" ":TYPE:" ":DIMS:" ":MATERIAL)
       CONVERT @LOWER_CASE TO @UPPER_CASE IN UNIT
       IF PRICEQ THEN RATE=RATE/PRICEQ
          ITEM="
          ITEM=DESC
          ITEM=M"
          ITEM=UNIT
          ITEM=ICONV(RATE,"MD4")
          ITEM=DATE
          ITEM=DESC:@VM:"Installation time"
          ITEM=1:@VM:INSTALL
          ITEM=10000:@VM:(INSTALL*10000)
          ITEM=UNIT:@VM:"Hours"
          ITEM=ICONV(RATE,"MD4")
          ITEM=SUPP_CODE
          ITEM=PROD_MAJOR
          ITEM=PROD_MINOR
          ITEM=COMM_MAJOR
          ITEM=COMM_MINOR
          WRITE ITEM ON RATE_BUILDUP,RATE_CODE ELSE NULL
       End
    End

Food for thought. BC


At 05 SEP 2008 07:54PM Colin Rule wrote:

No UTF8 is not being used.

Thanks anyway, not that I have any ieda what this does.

Colin


At 07 SEP 2008 01:36AM Warren Auyong wrote:

Isn't the bracket operator and col2() faster than a remove operator?

So why bother with the converts or swaps:

Equ CR$ to \0D\

Equ TAB$ to \09\

pos=1

header=data1,CR$

pos=col2() + 2 ;*(skip crlf)

Loop

line=datapos,CR$

until line eq ''

pos=col2() + 2

RATE_CODE=line1,TAB$

SUPP_CODE=linecol2()+1,TAB$

PROD_MAJOR=linecol2()+1,TAB$

PROD_MINOR=linecol2()+1,TAB$

COMM_MAJOR=linecol2()+1,TAB$

COMM_MINOR=linecol2()+1,TAB$

BRAND_NAME=linecol2()+1,TAB$

PRODUCT=linecol2()+1,TAB$

TYPE=linecol2()+1,TAB$

DIMS=linecol2()+1,TAB$

MATERIAL=linecol2()+1,TAB$

PRICEQ=linecol2()+1,TAB$

UNIT=linecol2()+1,TAB$

RATE=linecol2()+1,TAB$

INSTALL=linecol2()+1,TAB$

* build output record and trim it before write

repeat


At 07 SEP 2008 07:33PM Colin Rule wrote:

Thanks people.

I have implemented most of the changes, and each one seems to shave a millisecond or two, which over 500,000 records, all add up. Finished the import after 18 hours, so I will be trying a few more to see what has the biggest effect.

Colin


At 08 SEP 2008 06:07AM Colin Rule wrote:

Thanks Mike,

I have used a separate temp file to try the presize and it certainly makes the biggest difference, but…

I have the Dicts in the DATA folder of the Applicaiton, and the DATA files reside in one of many folders, one for each Project. When I try and delete a table in Database manager, it trashes the DICT for that table.

How can you delete the DATA portion only, and re-create the DATA only, and leave the Dictionaries alone… using the Database Manager/Table Builder?

Colin


At 08 SEP 2008 08:21AM Sean FitzSimons wrote:

Colin,

You will need to issue a CLEAR_TABLE command. There isn't a hook to this subroutine in the Database Manager/Table Builder at this point in time.

Sean

View this thread on the Works forum...

  • third_party_content/community/commentary/forums_works/54b20f19f18436aa852574bb002f1d36.txt
  • Last modified: 2023/12/30 11:57
  • by 127.0.0.1