DMS Latency and Related Problems
"Latency" as used here refers to the noticeable time lag between an interactive action at a computer console and the visual effects of that action. An example is the typing of characters on the command line without the characters being immediately visible.
During the last several years, DMS faculty have from time to time reported instances of latency. Most of these reports concern latencies when the user is somewhere in their AFS login directory.
Possible causes of latency :
- The fileserver on which the user's login directory resides is overloaded, or has other problems
- The SAN (storage area network), on which most AFS volumes reside, is having a problem
- The user's computer is very heavily loaded, or not working properly
- There is a network problem, which can be anywhere along the path(s) from the user's computer to the fileserver or SAN (the SAN and all AFS fileservers are currently located in GITC 5302)
- Combinations of the above
First : Check http://my.njit.edu DMS for notices of outages that may be related to the problem.
Second : Use the following table to try to troubleshoot the problem.
Latency Troubleshooting
Symptom | Try this test | Comment |
---|---|---|
Latency in a directory (usually your login directory) | Does latency also exist in : /afs/cad/commom/temp/{A, B, C, D}? | If latency does not exist in at least one of these directories, the probable cause is a fileserver(s) or SAN problem. See Note 1. |
Does the same latency also exist on a mathsun ? (This test is to determine if the problem is with your computer). |
If yes, the cause is not your computer. If no, the cause is your computer, or your network connection. Check the load on your computer using top. |
|
Does the same latency also exist via a remote login from your computer to a computer in GITC - e.g., afs<n> ? (This test largely eliminates the factor of the network from GITC to computers in Cullimore). |
If yes, the cause is probably a fileserver or SAN problem. If no, the cause is either the network between GITC and your computer, or your computer. |
|
Does the same latency also exist via a local login to a computer in GITC - e.g. afs<n> ? (This test eliminates the factor of the network from GITC to computers in Cullimore). |
If yes, the cause is a fileserver or SAN problem. If no, the cause is either the network between GITC and your computer, or your computer. |
|
System response : "afs: Waiting for busy volume XYZ .." | None | Volume is being backed up or moved. See Note 2. |
Browser response for URL is slow, or stalls | Try lynx -dump <URL> on other machines, to see if the problem is isolated to your machine/network connection. Try the URLs http://www.njit.edu to test inside NJIT, and http://google.com to test outside NJIT. |
Use process of elimination to determine if the problem is general, or specific to Cullimore, or specific to your computer/network connection. If the same problem exists generally, there is a network problem, or a problem at the target URL. |
Symptom | Try this test | Comment |
Notes :
- /afs/cad/common/temp/{A, B, C, D} exist on different fileservers. If latency does not exist on at least one fileserver, your computer or the network are not likely to be the cause of the problem, leaving a filesever or SAN as the cause.
- The backup of volume XYZ is done on the XYZ.backup volume, which is a read-only clone of the read-write XYZ volume. This is done to prevent any inaccessibility for users while the volume is being backed up. Such inaccessibility is indicated by "afs: Waiting for busy volume XYZ in cell cad.njit.edu". If you encounter this message, please report it to ucssys@njit.edu.
- A "busy volume" message may also result from a system administrator moving a volume to a new disk location.
- UCS is intends to put scripts in place that will aid users in gathering data for use in diagnosing latency problems. When available, those scripts will be referenced on this page