Sign up on the Revelation Software website to have access to the most current content, and to be able to ask questions and get answers from the Revelation community

At 23 NOV 1998 05:17:20PM Victor Engel wrote:

In investigating a problem on our productional system, I am finding an inconsistency in operation. In order to try to narrow down the problem, I am testing on a copy of the system located on a hard drive and comparing the operation to the network version. The data record being investigated is identical as is the program and the MFS. Actually, there are two MFS's. One, AUDITLOG.MFS is part of HR-1 and does not have source code associated with it. It handles many things automatically, such as record locking (supposedly). Listed AFTER this MFS in REVMEDIA is my own MFS, EMP_DELTA, whose job is to track changes to the record. There have been recent problems that have caused me to suspect the operation of this MFS or the other one, so I put a trace at the very beginning of the MFS. The trace simply logs the values of all the passed variables. I am including in a chart a comparison between the operation of the same program locally vs. on the network. In addition to this test, I also attached the local files using t

he network system and vice versa. It was always the network version that broke to the debugger with an unassigned variable in RTP8 (WRITEV). In the chart below I have included basic information, such as NAME and OPERATION for each logged operation. In some cases, the network version executed more than one operation for the corresponding local one. I have no idea why. Below the chart is a copy of the test program.

Local

Network

open.file

open.file

NAME=EMP*HRIS

NAME=EMP*HRIS

 

 

READ.RECORD

READ.RECORD

NAME=40716

NAME=40716

STATUS=0

STATUS=Unassigned

 

READ.RECORD

 

NAME=40716

 

STATUS=Unassigned

 

READ.RECORD

 

NAME=40716

 

STATUS=0

 

 

WRITE.RECORD

WRITE.RECORD

NAME=40716

NAME=40716

 

READ.RECORD

 

NAME=9999

 

 

UNLOCK.RECORD

UNLOCK.RECORD

NAME=40716

NAME=40716

Here is the test program:

OPEN 'EMP' TO EMP THEN

 READV FUNCT_UNIT FROM EMP,40716,1238 THEN
    WRITEV 9999 ON EMP,40716,1238
    WRITEV FUNCT_UNIT ON EMP,40716,1238
 END

END

If anyone has any ideas about what is going on, please let me know.

Victor


At 24 NOV 1998 04:00PM Victor Engel wrote:

I did a followup test with the AUDITLOG.MFS for which we have no source taken out. The other MFS was left in place. This MFS logs the transaction being performed upon entry to the MFS. It also scans the record for any changes. I got roughtly the same results. So that eliminates AUDITLOG.MFS as a factor. Here was my procedure:

1. ATTACH the data file

2. ATTACH the CT file (stores MFS transactions)

3. ATTACH the program file

4. Clear the CT file

5. Run the program listed in the previous message.

6. Perform a filecopy of the CT file to another location to isolate this changes

Repeat in the other environment. Here are the transactions that occurred:

1. OPEN.FILE (name=HRIS*EMP both systems

2. READ.RECORD (name=40716) both systems

3. WRITE.RECORD (name=40716 {9999 is in record}) both systems

4. LOCK.RECORD (name=40716, fmc=2) both systems

5. READ.RECORD (name=9999) ONLINE ONLY

6. UNLOCK.RECORD (name=40716, fmc=2) ONLINE ONLY

7. WRITE.RECORD (name=40716 {5491 is in record}) both systems

8. LOCK.RECORD (name=40716, fmc=2) ONLINE ONLY

9. READ.RECORD (name=9999) ONLINE ONLY

10. READ.RECORD (name=9999) ONLINE ONLY

11. UNLOCK.RECORD (name=40716) both systems

12. FLUSH both systems

13. FLUSH both systems

My question is, why is the operation different online vs. offline? Why the extra arm waving online? For example, what in the world is it doing reading 9999 from the file? Why is it doing it twice the second time? There is a Btree index on this field on both systems.


At 24 NOV 1998 06:42PM K Gilfilen wrote:

Victor,

In looking at your first posting, I noticed that the status() was not set. That implies to me that the operation failed at a very low level, something on the Microsoft/NT side of things, no doubt, and the AREV filing system simply stopped where it was. Probably something that did not have an "else" statement or anything to handle failure. (I hate seeing "if" and other such statements without else branches, but if you look through the old AREV source code, it's all written that way. Good thing they didn't work for NASA.)

The parent process timed out and retried the operation, and the third time was a charm. It would be nice for the filing system to set status to different values as it steps through its processes; that of course, is water under the bridge. As John Madden stated (Cowboys vs Dolphins, 1993), the horse is already out of the barn.

The fact that it didn't happen on stand-alone implies that network functionality is not 100%, but that is made up through robustness, in the sense that something somewhere realized that the file operation failed and consequently retried.

Kenny


At 25 NOV 1998 03:18AM Victor Engel wrote:

I thought of this explanation. However, there are a couple of holes in it.

1) The value of the STATUS variable is being captured at the START of the MFS, so it has not yet been set. There is nothing wrong with it's being unassigned at this point.

2) I would agree that the timeout had some merit if there were some inconsistency in the number of retries. However, multiple attempts all show the exact three operations. I'm wondering if it has more to do with the network driver used. I'm writing this from home, but I believe the copy on the hard drive was using the non-networked driver, whereas the online version was using the NLM driver. I still don't understand some of the calls.


At 25 NOV 1998 04:26PM K Gilfilen wrote:

Well we've definitely plumbed the depth of my knowledge here. It's pure spculation now, as I know next to nothing about IPX, and therefore about how AREV and Novell split up the work of file management.

Could IPX be dividing the Read into segments for sending? Maybe there's a limit imposed on packet size by SLIP?


At 26 NOV 1998 10:51AM [email protected] - [url=http://www.sprezzatura.com]Sprezzatura, Inc.[/url] wrote:

Driver shouldn't make a difference, but you could try using the NLM driver on the local copy. Even then, it goes through differing logic if not the NLM.

Still, the network driver really relates to locking only, but with the NLM it is a bit difference, since the LH access code must change as well.

Pre-NLM, there was a library of LH access programs which were linked with a specific set of locking routines for the specific network.

Are you sure the systems are identical?

[email protected]

Sprezzatura, Inc.

www.sprezzatura.com_zz.jpg

View this thread on the forum...

  • third_party_content/community/commentary/forums_nonworks/b187de05e9c002ff852566c5007a6fc7.txt
  • Last modified: 2023/12/28 07:40
  • by 127.0.0.1