NLM STATISTICS DOCUMENTATION (Networking Products)
At 16 APR 2003 01:30:27PM Ted Archibald wrote:
One of my clients is experiencing major performance slowdown and I am trying to find cause.
I am reviewing the NLM statistics to see if there is any problem evident there.
One entry in the Communication Statistics is "IN PROGRESS EVENTS" that I do not understand.
Does there exist some good documentation as to the interpretation of this and other aspects of the NLM. I hope that this is not an orphan piece of AREV. I have done some searches and nothing helpful.
The reason why I am concerned about the "IN PROGRESS EVENTS" is that on some workstations it is high 1000, 1500, etc. I would have guessed that this statistic would show the current high watermark of the specific workstation and that it would be in the range of 5 to 20 outstanding requests that have not been resolved yet. The value for TOTAL is 88,231. If this number respresents a list of unresolved requests in the NLM and the NLM has to scan this list frequently then this might be a cause for slowness and great concern.
I would appreciate assistance from anyone who has wandered into this part of the forest.
Cheers from the Wet coast in Vancouver.
At 16 APR 2003 01:51PM [url=http://www.sprezzatura.com" onMouseOver=window.status= Click here to visit our web site?';return(true)]The Sprezzatura Group[/url] wrote:
This IS a cause for concern - to quote the NLM documentation * In Progresses are messages sent by the server to the client in response to the client resending the original message. This situation occurs when a client has sent a message to the server that requires processing, but before the server has finished processing the request, the retry interval expires, which triggers the client to resend the original request..
We've been doing a LOT of Novell speed/stability consultancy recently and there are a lot of potential factors. Let's start with Novell versions, fixes applied, workstation clients and largest LK/OV sizes along with number of clients and any routing? Also the contents of your lhstart.ncf, your autoexec.nt, config.nt and AREV batch file.
World Leaders in all things RevSoft
At 16 APR 2003 05:47PM Ted Archibald wrote:
Hi Sprezzatura Group
Thanks for the help
Symptoms
- system getting really slow
- In Progress stats seems to be high
Here is current environment
Novell 4.11
-SP 9 installed but removed because of performance
-SP 8 installed
Workstation client on DOS systems (the ones affected)
- client uses about 8 DOS systems to do major auto batch processing
- workstations use Win98SE - seem not to have problem with stats
- using QEMM to squeeze more memory than Microsoft mem mgr
- LSL, 3C90X.COM, IPXODI, VLM
Arev Batch
- lhipxtsr.exe /p/r:60
- command.com /e:3072 /c K:\arev\arev.exe %1 /sxm4096
Autoexec.ncf
- load lh /s:1
- load lhipxser /m:300 /p:1472
–(the m:300=5 hrs is to allow arev suspend for long time without NLM disconnecting)
xxx.NT files not existing - only DOS systems
Clients - max 29 on AREV licence, 50 on Novell licence
Arev File sizes in Revboot
Name LK MB OV MB
-SYSTEMP - 55 3613 (really big - must clear up manually)
could this be a problem for NLM performance?-LISTs - 2 251 (I will clear file)
–I have nightly purge that cleans up all lists with old dates
–For everything else I usually clean up ever 2 months or so
Application files - large but basically balanced
Name LK MB OV MB
File 1 - 163 819
File 2 - 205 390
File 3 - 148 378
File 4 - 102 109
File 5 - 97 80
File 6 - 78 90
No Routing - just 1 subnet
100 bps network with 3com switches (no hubs)
Other notes
Server is IBM 340 rack server with 1GB memory
Utilization occasionally peaks to 80% for several min.
Normal utilization is 5 - 20%
Other servers on network: backup server, Linux database server
Anything here that might be a problem ?
I will get any other detail that you think may be important.
Thanks
Ted Archibald
At 16 APR 2003 06:37PM [url=http://www.sprezzatura.com" onMouseOver=window.status= Click here to visit our web site?';return(true)]The Sprezzatura Group[/url] wrote:
Nothing SCREAMS problem! We have our suspicions but we'd need to do a lot closer observation on file sizing etc. By all means check files with large overflow and presize them too big and lock them. This should stop some churning.
On our last such exercise we spent two weeks tracking down all of the variants and getting the system stable and 'pukka' again. It would seem to be cheaper for the client to do the batch processing in a DOS box on stable machines running Windows 98 that do not have this problem. If there is a speed problem it would still be cheaper to buy faster machines.
World Leaders in all things RevSoft
At 16 APR 2003 06:50PM Ted Archibald wrote:
Hi Sprezzatura Group
Thanks for fast responses.
Back to my original question.
Where can I get as much doc on NLM as possible?
I want architure, design, logic, interpretation of stats, everything.
I will be cleaning up the big files and locking to prevent "churning"
I will keep this thread posted with status.
Cheers
Ted Archibald
At 16 APR 2003 07:26PM [url=http://www.sprezzatura.com" onMouseOver=window.status= Click here to visit our web site?';return(true)]The Sprezzatura Group[/url] wrote:
Regretfully the documentation (fully perusable on this web site), these threads and experience is pretty much it. We have a tool for the NT Service that allows us to study this sort of stuff more closely but we haven't bothered moving it to Novell because it'll work with both NT and Novell with the new universal driver and we have had no pressing demand. We do intend "shrink wrapping" it for the new universal driver.
World Leaders in all things RevSoft
At 17 APR 2003 11:31AM Ted Archibald wrote:
Thanks again for quick response
Back to the problem of "In Progress" counters being high.
You said this meant a "time out" occured and the request was not satisfied and the counter was not decremented. What "time out" are we talking about? Is this a parameter that I have control over?
Is this request from client to NLM that NLM does not answer and the request is sent again by client? If so, under what circumstances would the NLM not answer at any time. Is this an NLM "time out" or a client "time out". If this is a client "time out" then why does the NLM keep record of it and not the client?
Could this have been caused by the long time spent searching through a file such as the 1GB systemp OV file in my example?
Perhaps you could help me with the exact logic of the NLM: how it handles a client request. How many threads can it handle? What is the logic of the "time out"?. What are the priorities of processing within the NLM. etc. Surely there must be some documentation. To allow such a critical piece of software to be without detailed docs is negligent.
I really would like to get to the bottom of this problem and you seem to be the only one with some knowledge on this problem brave enough to jump in to help. I thank you but I still have a major problem on my hands.
Back to the client. I have removed the 1GM OV systemp and the huge LISTS files and will see how things run today. Hopefully this was the cause.
I will keep you posted. Please send me more details on the NLM.
Thanks
Ted Archibald
At 17 APR 2003 11:58AM Victor Engel wrote:
]100 bps network with 3com switches (no hubs)
]Anything here that might be a problem ?
This could be a problem if it is as written. Perhaps what you really meant was 100 Mbps.
What other traffic (besides Arev) is on the subnet? Any applications with lots of large packets?
At 17 APR 2003 12:05PM Victor Engel wrote:
I, like you, have wished for such documentation, but I don't think it exists or has been made public. I think the statistic you are looking at is cummulative, and not a high-water mark as you suggest. So a number like 88,000 doesn't mean there is a list somewhere of 88,000 entries. More likely, there were 88,000 distinct instances that were either subsequently resolved or dropped.
Unfortunately, running benchmarks and tests seems to be the only way to deduce more information from these statistics than is mentioned in the NLM docs.
If there is more documentation available, I'd certainly like to see it. I've asked for it before, but I've never seen anything other than what is in the forums and the NLM docs.
At 17 APR 2003 01:31PM [url=http://www.sprezzatura.com" onMouseOver=window.status= Click here to visit our web site?';return(true)]The Sprezzatura Group[/url] wrote:
Ted
We sympathise with the problems that you are having with your client. A 1GB overflow frame is going to do nobody any favours.
We are unsure as to how to interpret your closing request. We are a consultancy. If you wish to employ us you are free to do so. When we are posting gratis in areas which overlap those where we actually make money we have to remain circumspect.
You quote "You said this meant a "time out" occured and the request was not satisfied and the counter was not decremented". We cut and pasted from the manual we suggested you review. That is the extent of the commentary in the manual.
Regards
World Leaders in all things RevSoft
At 20 APR 2003 04:43PM Ted Archibald wrote:
Thanks Victor
I will do some more digging. I know there is lots of other stuff on the network and maybe some of it is causing a problem. I have looked at the statistics on the 3com switch and nothing seems wrong. I do not have much experience in interpretation of switch stats so anything might be wrong. There were lots of packets 64 bytes and less but not very many at the large size. I will take another look.
Thanks again.
At 20 APR 2003 04:57PM Ted Archibald wrote:
Hi Spressatura
Please don't get me wrong. I was just getting rather upset with my lack of understanding of what was going on inside of the NLM.
You people are the greatest (together with WinWin) in the AREV and OI world and I have benefited from your vast knowledge and assistance in the past and hopefully will be able to in the future.
The comments were aimed more to the original developers of the NLM or their heirs. I was hoping that either you people had some documentation tucked away in your stack of things germain to the topic or that someone from development might step in with the goods.
Thanks again and I am sorry that I seemed ungrateful.
Cheers
Ted Archibald
At 20 APR 2003 05:02PM Ted Archibald wrote:
Victor
I am going to give Mike a call and see what is available.
Thanks for your support.
Ted
At 21 APR 2003 02:50PM [url=http://www.sprezzatura.com" onMouseOver=window.status= Click here to visit our web site?';return(true)]The Sprezzatura Group[/url] wrote:
Ted
We have nothing that we currently intend making public domain. Regretfully this is one of those areas where information has real value. Part of consultancy is the accretion of specialist knowledge over time. Such knowledge is normally accorded weighted values. The knowledge about network performance is one that is of most value to customers with huge amounts of data, who are normally prepared to pay for the investigation required for the acquisition of said knowledge. Implicit in such knowledge gathering is the fact that these customers will benefit from research undertaken for other customers. They are buying not just the time but the whole gestalt. For us to make such information publicly available would both undermine our commercial base and short change those who have paid for it.
From time to time we make information publicly available to assist the Revelation community. Normally this is information which it is unlikely we would be able to gain real value from in the world of consulting OR where the loss engendered by the release of such knowledge is outweighed by the positive karma engendered by it's release.
The areas you seek information on regretfully do not fall into that latter category.
World Leaders in all things RevSoft