dropped connection (OpenInsight 32-Bit)
At 14 NOV 2005 08:21:20AM Sandra D'Angelo wrote:
We have a client experiencing dropped connections to the database. At any point in the application, the system no longer executes commands. The list boxs still work and you can cut and past but when they try to use a function that connects to the server such as save, popup, or close window the application does not respond. The client has two clustered Cisco 2950 switches, 2 teamed NIC cards, Windows 2003 server, and Windows XP workstations. They switched to XP about a year ago while using an AREV 3.12 product and they started to lose their connections with the AREV product – it would just close out in the middle of a session. We switched them to OI 7.1 with Universal Driver 3.0.0.3 with TCP/IP configuration. The lost connections continued.
The problem seems to occur on two of the floors which connect to one of the switches. But we hired a network engineer to examine the network and he does not see any problems with the network. We are not receiving any error messages on workstations or any notices in the application log on either the server or the workstations.
Things we have tried to no avail:
1) Checked sleep mode on workstations and network cards
2) Changed the default time-out period for dropped idle connections on the Windows 2003 server.
3) Removed background indexing
4) Switched them to Universal Driver named pipes, NT driver 2.1 with TCP/IP and NT driver 2.1 Named pipes.
5) Checked the switches to see if any connections are constantly lit -looking for a bad network card.
6) Changed the network cables on the workstations that are having the problems.
Any assistance is greatly appreciated.
At 14 NOV 2005 09:32AM John Bouley wrote:
I am going to take a stab… I don't really understand what a Clustered Cisco Switch is or what a Teamed NiC is but I wonder if this could be causing problems with the service or UD. I would imagine that the service must be maintaining a connection by some sort of identifier in the server. I wonder what happens if the clustering decides that a particular workstation would really best be handeled by a different Nic? Wouldn't this be treated like a new connection with the Revelation Serice?
Just my two cents…
John
At 14 NOV 2005 10:00AM Sandra D'Angelo wrote:
Interesting theory. The workstations that do have the problem are leaving mutliple sessions open on the port 777. Even after rebooting the machine we still show 3 sessions open on the port for the workstation. Is the network holding open the session or the Driver? How do we close the session? Thanks, Sandra
At 14 NOV 2005 10:07AM John Bouley wrote:
On W2k3 you can right-click on My computer and select Manage. Open Shared Folders and select Sessions. You can then right-click and close the orphaned session.
However, following on the same logic as my previous post the real problem isn't how do you close the orphaned connections but how do you prevent them from occuring in the first place. I'm not sure I can address this if it is indeed caused by the network infrastructure. I think you will have to find some way of preventing these stations from moving between Nics in the Cluster…
HTH,
John
At 14 NOV 2005 11:06AM Sandra D'Angelo wrote:
I just confirmed that the client's other site is is running has the same clustering and teamed NiC Cards without problems. We are not seeing orphaned connections in computer management. However, we are seeing multiple port sessions open using netstat -0 command. We dropped the computer using net session \\tcpip address /delete but the port sessions are still open. We were unable to delete the session using the computer name only the tcpip address. They are running Windows 2003 server service pack 1. Thanks again for your assistance, Sandra
At 14 NOV 2005 11:41AM dbakke@srpcs.com's Don Bakke wrote:
Sandra,
Is this the same setup you contacted us about a little while ago? I thought there was a suspicion that one of the network cards wasn't operating correctly. Was this not the case?
You may have to create a log file to track what is going on.
dbakke@srpcs.com
At 14 NOV 2005 12:17PM Sandra D'Angelo wrote:
Yes it is. The network cards do not seem to be an issue and we did run a log file and only got error 100's which I was told not to worry about.
At 14 NOV 2005 01:02PM Ray Chan wrote:
Sandra,
Just curious, but does this happen after an "X" number of users logs on?
Ray Chan
At 14 NOV 2005 02:27PM Sandra D'Angelo wrote:
Yes. They have 1 user developer license and 250 runtime license. They are peaking at about 20 concurrent users. The system generally has the most problems during 10am-4pm which is when the most users are on the system. I would say that once more than 10 users access the system they start to have problems. Thanks, Sandra
At 14 NOV 2005 02:46PM Ray Chan wrote:
Sandra,
Don't know if this is applicable or helpful, but Does this sound similar?"
Ray Chan
At 14 NOV 2005 09:17PM Dimitri Mandelis wrote:
Well, I'll throw my 2 cents in here with some comments to isolate it.
Since it appears the problem happens with Arev and OI and since it appears to happen on 2 floors. It would appear to be a problem with the network or the Server. I would swap out the switches 1 at a time (preferably with one from the location that worked) to see if the problem swithed locations or stayed with original location. If nothing changed with either switch well then you know it's not those.
I would then try changing network cards on the server to a different manufacturer. If that didn't help I would build a small temporary server out of a PC and copy the database over and try it on that for a day to rule out the server.
I hope that helps
Dimitri
At 15 NOV 2005 02:29PM Sandra D'Angelo wrote:
Thank you for your suggestions. I too think it's network but I get the usual response that "all the other applications" are working. It is hard to explain that our app needs a persitant connection unlike Excel and Word. We have noticed that the workstations have very long names so I checking into whether they are being truncated by OI/Arev. I have also talked them into trying another server which will help us rule out the network card and server. I will try talking them into swapping out the switches but that is a harder battle to fight for some reason. Thanks again, Sandra
At 15 NOV 2005 03:01PM Karen Oland wrote:
We have seen the same problem with a user who has a large number of connections (]100) after they switched from Novell to Windows for the server (it absolutely NEVER happened with novell and the NLM). No other hardware was changed and it happens more to their accounting users (who do lots of transactions) and as far as I know, never to data entry users (who run practically no programs that update/lock large numbers of users at a time).
The connection to the server simply drops, the program appears to continue (writes succeed, but don't take place, no errors get generated, so readnext appears to gracefully return EOF, etc). We have gotten around this by some extra backups of data to be posted and a manual process to verify the posting truly succeeded - when this occurs, we reboot the workstation (often there are orphaned locks, otherwise) and manually finish the posting (a support headache).
Changing various pieces of hardware never resulted in any real solution - it could be a switch issue, but the same switch connects to the old novell server that never had the problem …