Large file import (OpenInsight 32-Bit)
At 05 SEP 2008 04:34:36AM Colin Rule wrote:
I am importing 515,000 records from a tab delimited file, 150Mb.
I am using the 'Remove LINE From DATA AT position Setting flag' facility to extract the data quickly, create a record with about 20 fields and write it to the database.
I ran the import last night, started at 9:00pm, still running at 8:00am the next day, 60% complete.
There is little left I can do optimise the code, but I have made a few small changes today, and will try again.
I am running locally on a PC, not connected to a server, so no interface with the LH drivers etc, and my PC is reasonably quick.
Does anyone have any suggestions as to ways to make this run faster, at least so that it can be run in a timeframe that does not involve a weekend.
Thanks
Colin
At 05 SEP 2008 06:34AM [url=http://www.sprezzatura.com]The Sprezzatura Group[/url] wrote:
Post the code ;)
World leaders in all things RevSoft
At 05 SEP 2008 07:18AM Colin Rule wrote:
Code is below, with bits that are irrelevant stripped out.
CONVERT @LOWER_CASE TO @UPPER_CASE IN FILEOSOPEN FILE TO DOSFILE ELSE RETURNCALL UTILITY("CURSOR","H")OSREAD DATA FROM DOSFILE ELSE RETURNOSCLOSE DOSFILESWAP CHAR(13) WITH @FM IN DATASWAP CHAR(10) WITH "" IN DATAPosition=1MAX_LINES=.MAX_LINES-]TEXTNO_LINES=COUNT(DATA,@FM)+1CLEARFILE RATE_BUILDUP ELSE NULLCLEARFILE RATE_INDEX ELSE NULLFOR I=2 TO NO_LINESRemove LINE From DATA AT position Setting flagSWAP CHAR(9) WITH @FM IN LINEROW="FOR C=1 TO NO_COLSCOL=COLUMNSROW=LINENEXT CRATE_CODE =TRIM(ROW)SUPP_CODE =TRIM(ROW)PROD_MAJOR =TRIM(ROW)PROD_MINOR =TRIM(ROW)COMM_MAJOR =TRIM(ROW)COMM_MINOR =TRIM(ROW)BRAND_NAME =TRIM(ROW)PRODUCT =TRIM(ROW)TYPE =TRIM(ROW)DIMS =TRIM(ROW)MATERIAL =TRIM(ROW)PRICEQ =TRIM(ROW)UNIT =TRIM(ROW)RATE =TRIM(ROW)INSTALL =TRIM(ROW)IF NOT(NUM(INSTALL)) THEN INSTALL="DESC=TRIM(BRAND_NAME:" ":PRODUCT:" ":TYPE:" ":DIMS:" ":MATERIAL)IF RATE_CODE THEN GOSUB WRITE_RATECALL YIELD()NEXT ITIME2=TIMEDATE()MSG="MSG=Items import=:NO_LINESMSG=Import start time=:TIME1MSG=Import finish time=:TIME2CALL BUILD_MSG(@WINDOW,MSG,"I")RETURN
WRITE_RATE:
CONVERT @LOWER_CASE TO @UPPER_CASE IN UNITIF NOT(NUM(PRICEQ)) THEN RETURNIF PRICEQ THEN RATE=RATE/PRICEQITEM="ITEM=DESCITEM=M"ITEM=UNITITEM=ICONV(RATE,"MD4")ITEM=DATEITEM=" ;ITEM="ITEM=DESC ;ITEM=Installation time"ITEM=1 ;ITEM=INSTALLITEM=10000 ;ITEM=INSTALL*10000ITEM=UNIT ;ITEM=Hours"ITEM=ICONV(RATE,"MD4");ITEM="ITEM=SUPP_CODEITEM=PROD_MAJORITEM=PROD_MINORITEM=COMM_MAJORITEM=COMM_MINORWRITE ITEM ON RATE_BUILDUP,RATE_CODE ELSE NULLRETUR
At 05 SEP 2008 07:21AM Mike Ruane wrote:
Colin-
If this is a new file, when you make it, presize it.
If it it not a new file, make a new one, presize it, copy the existing records in, and then import.
If there is indexing on the table, take it off, import, put it back on.
Hope it helps-
Mike
At 05 SEP 2008 07:48AM [url=http://www.sprezzatura.com]The Sprezzatura Group[/url] wrote:
In addition to Mike's comments a few minor coding points spring straight to mind from an optimization POV:
1) Consider using convert instead of swap. It's faster. e.g
Instead of
SWAP CHAR(13) WITH @FM IN DATASWAP CHAR(10) WITH "" IN DATADo:
convert \0D0A\ to \FE\ in dataLikewise for
SWAP CHAR(9) WITH @FM IN LINEdo
convert \09\ to \FE\ in lineUse embedded characters with the \xx\ notation rather than the char() function, because char() is an opcode and therefore has overhead
2) Consider yield() less - just call it every 100 iterations for example …
3) Do you need to put each item from row into a separate variable? If there's no processing to do on a field in ROW then there's little point putting it into another variable before transferring it into the ITEM record. Just more overhead (though it does make the code more readable, but then equates would help in that situation …)
World leaders in all things RevSoft
At 05 SEP 2008 08:37AM Stefano Cavaglieri wrote:
Colin -
Does your OI application run in UTF8 character mode? If yes, this is a know issue (at least I know it is), consider a for/next loop instead of remove/from/at.
- Stefano
At 05 SEP 2008 12:57PM Richard Hunt wrote:
A few more efficient comments…
1) The SWAP CHAR(9) WITH @FM sentence should be done before the REMOVE sentence loops. Use the sentence CONVERT CHAR(9) TO @VM IN LINE Using @VM will cause you to modify other sentences in your program.
2) The TRIM function… Use the TRIM function once right after the REMOVE sentence. The TRIM function is a multivalue function.
3) When combining values within a field try using only one sentence. What I mean is for sentence RECORD=LINE and RECORD=LINE replace those sentences with RECORD=LINE:@VM:LINE
4) Is it possible the the file you are importing to has indexes???
At 05 SEP 2008 02:13PM Bob Carten wrote:
Colin - Others have made the most valuable the suggestions, especially the suggestion to presize the files and sizelock them.
I made a quick pass with a couple of other optimizations
* pull everything possible out of the main loop
* don't bother with swaps, just use char(13) and char(9) as delimiters
* store intermediate values in dimensioned arrays rather than dynamic arrays
* I did not change the Write part, but you could use dimensioned arrays, matwrite there.
* note - I just typed this out, not tested
- Bob
CONVERT @LOWER_CASE TO @UPPER_CASE IN FILE
OSOPEN FILE TO DOSFILE ELSE RETURN
CALL UTILITY("CURSOR","H")
OSREAD DATA FROM DOSFILE ELSE RETURN
OSCLOSE DOSFILE
equ cr$ to \0D\
equ tab$ to \09\
NO_LINES=COUNT(DATA,@FM)+1
* Sizelock before clear
equ set_sizelock$ to 5
call Fix_LH(RATE_BUILDUP, set_sizelock$, 0,1)
call Fix_LH(RATE_INDEX, set_sizelock$, 0,1)
CLEARFILE RATE_BUILDUP ELSE NULL
CLEARFILE RATE_INDEX ELSE NULL
* Use matrices instead of dynamic array for intermediate data
DIM ROW(NO_COLS)
DIM LINE(NO_COLS)
DIM MAP(NO_COLS)
BLANK="
MATPARSE BLANK INTO ROW
MATPARSE BLANK INTO LINE
MATPARSE BLANK INTO MAP
* Use equates to avoid intermediate variables
equ RATE_CODE to ROW(1)
equ SUPP_CODE to ROW(2)
equ PROD_MAJOR to ROW(3)
equ PROD_MINOR to ROW(4)
equ COMM_MAJOR to ROW(5)
equ COMM_MINOR to ROW(6)
equ BRAND_NAME to ROW(7)
equ PRODUCT to ROW(8)
equ TYPE to ROW(9)
equ DIMS to ROW(10)
equ MATERIAL to ROW(11)
equ PRICEQ to ROW(12)
equ UNIT to ROW(13)
equ RATE to ROW(14)
equ INSTALL to ROW(15)
* Build map outside of loop
FOR C=1 TO NO_COLS
MAP(C)=COLUMNSNEXT C
* Main loop
FOR I=2 TO NO_LINES
parse on line feeds and tabs, don't need the swapswork=dataipos,lf$ipos=col2()+2 ; * Skip over lf$matparse work into line using tab$
Mapping done using dimmed arraysFOR C=1 TO NO_COLSCOL=map(C)ROW(C)=trim(LINE(COL))NEXT CIF NOT(NUM(INSTALL)) THEN INSTALL="DESC=TRIM(BRAND_NAME:" ":PRODUCT:" ":TYPE:" ":DIMS:" ":MATERIAL)IF RATE_CODE THEN GOSUB WRITE_RATECALL YIELD()NEXT I
TIME2=TIMEDATE()
MSG="
MSG=Items import=:NO_LINES
MSG=Import start time=:TIME1
MSG=Import finish time=:TIME2
CALL BUILD_MSG(@WINDOW,MSG,"I")
RETURN
WRITE_RATE:
CONVERT @LOWER_CASE TO @UPPER_CASE IN UNIT
IF NOT(NUM(PRICEQ)) THEN RETURN
IF PRICEQ THEN RATE=RATE/PRICEQ
ITEM="
ITEM=DESC
ITEM=M"
ITEM=UNIT
ITEM=ICONV(RATE,"MD4")
ITEM=DATE
ITEM=" ;ITEM="
ITEM=DESC ;ITEM=Installation time"
ITEM=1 ;ITEM=INSTALL
ITEM=10000 ;ITEM=INSTALL*10000
ITEM=UNIT ;ITEM=Hours"
ITEM=ICONV(RATE,"MD4");ITEM="
ITEM=SUPP_CODE
ITEM=PROD_MAJOR
ITEM=PROD_MINOR
ITEM=COMM_MAJOR
ITEM=COMM_MINOR
WRITE ITEM ON RATE_BUILDUP,RATE_CODE ELSE NULL
RETURN
At 05 SEP 2008 02:38PM Bruce Cameron wrote:
Colin,
Not sure if this is faster then what you have as I haven't benchmarked but you could try…
osopen data from dosfile then
convert char(10) to "" in dataconvert char(13) to @fm in datadata=delete(data,1,0,0)totrows=dcount(data,@fm)for val=1 to totrowsrow=data
delete the row from the variabledata=delete(data,1,0,0)
Do your stuff
next valend
Also, you could move your lines of..
IF RATE_CODE ThenAND
IF NOT(NUM(PRICEQ))up in your code so that if they don't pass you don't have
to read and set any variables. EG.
Row=Trim(Row)RATE_CODE =ROWIF RATE_CODE ThenSUPP_CODE =ROWPROD_MAJOR =ROWPROD_MINOR =ROWCOMM_MAJOR =ROWCOMM_MINOR =ROWBRAND_NAME =ROWPRODUCT =ROWTYPE =ROWDIMS =ROWMATERIAL =ROWPRICEQ =ROWIF NOT(NUM(PRICEQ)) ElseUNIT =ROWRATE =ROWINSTALL =ROWIF NOT(NUM(INSTALL)) THEN INSTALL="DESC=TRIM(BRAND_NAME:" ":PRODUCT:" ":TYPE:" ":DIMS:" ":MATERIAL)CONVERT @LOWER_CASE TO @UPPER_CASE IN UNITIF PRICEQ THEN RATE=RATE/PRICEQITEM="ITEM=DESCITEM=M"ITEM=UNITITEM=ICONV(RATE,"MD4")ITEM=DATEITEM=DESC:@VM:"Installation time"ITEM=1:@VM:INSTALLITEM=10000:@VM:(INSTALL*10000)ITEM=UNIT:@VM:"Hours"ITEM=ICONV(RATE,"MD4")ITEM=SUPP_CODEITEM=PROD_MAJORITEM=PROD_MINORITEM=COMM_MAJORITEM=COMM_MINORWRITE ITEM ON RATE_BUILDUP,RATE_CODE ELSE NULLEndEndFood for thought. BC
At 05 SEP 2008 07:54PM Colin Rule wrote:
No UTF8 is not being used.
Thanks anyway, not that I have any ieda what this does.
Colin
At 07 SEP 2008 01:36AM Warren Auyong wrote:
Isn't the bracket operator and col2() faster than a remove operator?
So why bother with the converts or swaps:
Equ CR$ to \0D\
Equ TAB$ to \09\
pos=1
header=data1,CR$
pos=col2() + 2 ;*(skip crlf)
Loop
line=datapos,CR$
until line eq ''
pos=col2() + 2
RATE_CODE=line1,TAB$
SUPP_CODE=linecol2()+1,TAB$
PROD_MAJOR=linecol2()+1,TAB$
PROD_MINOR=linecol2()+1,TAB$
COMM_MAJOR=linecol2()+1,TAB$
COMM_MINOR=linecol2()+1,TAB$
BRAND_NAME=linecol2()+1,TAB$
PRODUCT=linecol2()+1,TAB$
TYPE=linecol2()+1,TAB$
DIMS=linecol2()+1,TAB$
MATERIAL=linecol2()+1,TAB$
PRICEQ=linecol2()+1,TAB$
UNIT=linecol2()+1,TAB$
RATE=linecol2()+1,TAB$
INSTALL=linecol2()+1,TAB$
* build output record and trim it before write
repeat
At 07 SEP 2008 07:33PM Colin Rule wrote:
Thanks people.
I have implemented most of the changes, and each one seems to shave a millisecond or two, which over 500,000 records, all add up. Finished the import after 18 hours, so I will be trying a few more to see what has the biggest effect.
Colin
At 08 SEP 2008 06:07AM Colin Rule wrote:
Thanks Mike,
I have used a separate temp file to try the presize and it certainly makes the biggest difference, but…
I have the Dicts in the DATA folder of the Applicaiton, and the DATA files reside in one of many folders, one for each Project. When I try and delete a table in Database manager, it trashes the DICT for that table.
How can you delete the DATA portion only, and re-create the DATA only, and leave the Dictionaries alone… using the Database Manager/Table Builder?
Colin
At 08 SEP 2008 08:21AM Sean FitzSimons wrote:
Colin,
You will need to issue a CLEAR_TABLE command. There isn't a hook to this subroutine in the Database Manager/Table Builder at this point in time.
Sean