Errata/Troubleshooting for NetSolve, version 1.4


 History of NetSolve releases

 Version 1.0 : ??

 Version 1.1 : January, 1998

 Version 1.2 : February 15, 1999

 Version 1.3beta1-6: sporadically in 2000

 Version 1.4 : July 31, 2001

To better address the needs of our NetSolve users, we're in the process of creating this Errata/Troubleshooting webpage. This file provides diagnostic help in explaining the reasons for specific NetSolve run-time error messages, gives a list of known deficiencies in the NetSolve system, and provides up-to-date information on bugs reported and how to download patches to NetSolve.

This file contains:

NetSolve has been tested on a variety of architectures.

In addition, testing was performed using Mathematica version 4.0 for Linux and MathLink version 3.8, and Matlab Version 6.0.0.88 Release 12 (Unix and Windows versions), NWS release 2.0, IBP version 1.0.1, PETSc 2.0.29, Aztec version 2.1, SuperLU version 1.1, ScaLAPACK version 1.6, and Java 1.2.


Errata in NetSolve Users' Guide

No known errata at this time.


Errata in NetSolve, version 1.4

No known errata at this time.


Bug Report Checklist

When reporting a suspected bug to the netsolve mailing alias, please supply the following information. These are the first questions that we will ask.

  1. On what type of machine did you install NetSolve (os and compiler)?
  2. What is the exact configure line used to configure NetSolve (config.status)?
  3. Did you compile client only or client/agent/server?
  4. Did you send us the cut-and-paste of the error message encountered?
  5. If the error occurred at runtime, did you consult the "Troubleshooting" section of this Errata file?
  6. If the error occurred at runtime, did you check for more information in the nsagent.log and nsserver.log files? What was the text found in these log files?


Troubleshooting Run-Time Error Messages in NetSolve, version 1.4

If an error occurs during the invocation of NetSolve, a variety of diagnostic runtime error messages, as well as error codes that can be returned when calling a NetSolve function from the C or Fortran interfaces, are provided. The error codes and runtime error messages are listed in Chapter 24 of the NetSolve Users' Guide, and may have several possible explanations/causes. If one of these error messages occurs, the user should first check the agent and server log files, $NETSOLVE_ROOT/nsagent.log or $NETSOLVE_ROOT/nsserver.log, respectively. These files may contain more information to clarify the reason for the error message.

NS: unknown problem
Possible causes: The user has requested a problem that is not serviced by any of the available servers. To check for this possibility, the user can invoke the NS_problems command, and see if the problem requested is included in the list of available services. To expand a server's capabilities, the user should refer to Chapter 13 of the NetSolve Users' Guide.
NS: no available server
Possible causes:
  1. Service zombie, i.e., a process that has gone awry and can be seen using ps -ef or ps -augx, and must be killed using kill -9 pid. This can occur if a service hangs or is abnormally terminated.
  2. The user could have requested a problem that is not serviced by any of the available servers. To check for this possibility, the user can invoke the NS_problems command, and see if the problem requested is included in the list of available services.
NS: impossible to bind to port
Possible causes:
  1. This error usually occurs when the user is trying to start an agent on a machine to which an agent is already running. The process could be owned by the user or by another user.
  2. Or, it is possible that another user is running a process on the port that you have requested for the agent process.
NS: Cannot contact agent
Possible causes:
  1. This error will occur if there is a conflict in the agent specified by the NETSOLVE_AGENT environment variable, and the @AGENT that is specified in the $NETSOLVE_ROOT/server_config file.
  2. Or, it is possible (for whatever reason) that the agent is not responding. The user could query with the NS_config command to request the list of reachable agents/servers in the NetSolve configuration, or simply issue the NS_killall command to kill the agent and server and then restart the processes.

Known Deficiencies in NetSolve, version 1.4

The following caveats exist in the NetSolve code, and will be fixed in an upcoming release.

  1. Assumes $NWS_DIR/bin/ARCH/ is in your path if you enable NWS (configure --with-nwsdir=NWS_DIR) in NetSolve.
  2. Requires (PETSc, Aztec, and ITPACK) to all be installed in order to use the sparse_iterative_solve PDF. Likewise, requires MA28 and SuperLU to both be installed in order to use the sparse_direct_solve PDF. Need to incorporate sparse wrapper modification so that the pdf can be enabled if only one of the libraries is installed.
  3. Inconsistent printed error message between C, Fortran, Matlab, Mathematica, and Windows client interfaces. Missing "NS:" prepended to error messages. Windows client interface is still prefixed with "NetSolve:".
  4. Missing run-time error message for NetSolveUnknownHandle (-40) error in src/CoreFunctions/netsolveerror.c.
  5. Mathematica ScaLAPACK interface fails when RHS > 1, questioning transpose routine when matrix is not square.
  6. "Invalid argument" message sent to stderr (nsserver.log) when invoking 'sparse_iterative_solve', 'ITPACK', ... coming from SSORI from ITPACK. Needs further investigation.
  7. When running multiple servers within the same tree, if a log file isn't explicitly chosen, the newest server will take over the log file and you won't get logs of messages from other servers. You should explicitly direct the log of each server to a unique file. Combine all server log information into one log file or should be maintain separate logs for each server?
  8. There is currently no limit on the size of the nsserver.log and nsagent.log files. We should incorporate some mechanism to limit the size of those files, and have it start overwriting the file at a certain point.
  9. benchmarking anomaly. NetSolve/src/Server/kflops.c.
  10. check_server timing bug.
  11. unexplained anomalous behavior with Workload reporting.
  12. pdfgui requires Java 1.2 or later.
  13. clean up compiler warning messages.
  14. memory leaks.
  15. case insensitivity of job submit for 'PETSC', 'AZTEC', 'ITPACK', 'SUPERLU', 'MA28'. Just need to do a strcasecmp() in NetSolve/problems/sparse_iterative_solve and sparse_direct_solve.
  16. NetSolve/src/Examples/sparse_testers/itpack_tester/ is referencing the old interface to 'itpack_solve' and the ../itpack_tester/Makefile is hardwired for gcc.
  17. The size of the problem_init.o grows with the number of pdf services enabled. Depending upon the amount of memory available on a given architecture, it may be possible that not all pdfs can be enabled.
  18. ARPACK pdf was not tested for this release. ARPACK enablement requires Chao Yang's SPEIG distribution to be included with the standard ARPACK distribution.
  19. @COMP limited functionality in PDFs. Its functionality needs to be expanded.
  20. @COMPLEXITY expression is too limited. We need to be able to express fractions (e.g., 2/3 n^3), and thus need 3 integers to be specified (numerator, denominator, and exponent) instead of the existing 2 integers. It would also be helpful to have a "memory" complexity as well as the "flop" complexity to more easily refine the scheduling process.
  21. IRIX WORKLOAD SET ARBITRARILY HIGH: The server workload numbers reported to the agent for use in scheduling are statically set to 58 for any IRIX platform. This high value ensures that any IRIX machine in a NetSolve grid will rarely be assigned to service a request, unless it is the only server configured to service a particular problem. And obviously, the agent has no true notion of the load on the machine. All IRIX boxes would look alike to the agent.
  22. IRIX Matlab: We had difficulties to build "make matlab" on IRIX due to duplicate symbol warnings and the compile fails. This is under investigation.
  23. IBP enablement within NetSolve was tested with IBP v1.0.1, whis is no longer available on the IBP website. NetSolve should work fine with IBP v1.0.2, however, we have not yet tested with that version.
  24. Windows client software currently only works with Windows2000. It will not work on Windows98 or earlier.
  25. The port used by the NetSolve Agent is currently hardwired to that specified in NetSolve/include/general.h. In a future release, the port number for the agent will be configurable when the agent is started.
  26. If more than one agent is running in a netsolve pool, the agent will not properly update information when an agent is taken down. There is inconsistency with reporting.