Parse debug setup statements

ss_debug_env is a function defined in ssdebug.c.

Synopsis:

herr_t ss_debug_env(MPI_Comm UNUSED_SERIAL comm, const char *s_const)

Formal Arguments:

  • comm: The library communicator. Pass any integer value when using a version of SSlib compiled without MPI support.
  • s_const: Optional string to use instead of looking at the SSLIB_DEBUG environment variable. Pass null to use SSLIB_DEBUG instead. Passing an empty string (or all white space) accomplishes nothing. Task zero broadcasts this string to all the other tasks.

Description: This function looks at the contents of the SSLIB_DEBUG environment variable. It is a semicolon separated list of terms which control various things. Valid terms are:

task=*n*: Controls which tasks will be affected by subsequent debugging terms. A value of n with an

initial plus sign will add task n to the list of selected tasks; a leading minus sign removes the task from the list; lack of a plus or minus makes task n the only selected task. The value all or none can also be supplied which selects all tasks or no tasks, respectively. The value can also be a comma-separated list of task ranks which acts the same as if multiple

task terms had been specified (the plus or minus sign should be at the beginning of the list

and applies to all values of the list).

error=*n*: When errors are pushed onto the error stack they are each given a unique (within a task)
identification number. When this error debugging term is specified then the debugger of choice is invoked when error number n is pushed onto the stack. If the equal sign and error number are omitted then the debugger is never started for an error, but the error stack will display the error identification numbers. Only one error number can be specified per task–if more flexibility is needed then the application can be run under a debugger with a breakpoing set in ss_error.
file=*name*: Selects the output file to use for subsequent debugging terms for the selected tasks. The
file will be created if it doesn’t exist or truncated if it does exist. The name is actually a printf format string and the first format specifier (if present) should be for an integer task number. If name is the word none then output is disabled; if name is a positive integer then the specified file descriptor is used without attempting to open it (this is useful if the descriptor was opened with the shell). If a number and name are both specified separated by a comma then the name is opened and dup’d to the desired file descriptor. If the name begins with a `<’ character then the file is opened for read-only.
stop: The specified MPI task(s) will print their MPI rank and process ID and then suspend themselves,
giving an opportunity for a debugger to attach.
pause=``N``: The specified MPI task(s) will immediately pause for N seconds. This is useful
when a task needs to give a debugger (such as strace) to automatically attach to child processes.

debugger=*name*: Specifies which debugger should be used. The default is `ddd’.

debug: The specified debugger (or ddd) is started for the affected task or tasks.
This probably only works on systems that have a /proc/self/exe link to the executable and the DISPLAY environment variable set properly for the affected task. If the non-default debugger is desired then the `debugger’ keyword must appear before this `debug’ keyword.
signal: Start the debugger when a task is about to die from certain signals (those that signify
a program error). The task is suspended (although other signal handlers might still be executed) and must be explicitly killed. The `debug’ keyword takes precedence over `signal’.
stack: Turn automatic error reporting on or off for selected tasks depending on the current setting
for the file descriptor. When off, errors are reported by return values as usual and the error stack contains information about the error, but the stack is not automatically printed. The default is that errors are printed to stderr.
pid: Print the process ID for all selected tasks. This is useful when various tools (such as
valgrind) print PIDs but have no way of knowing the MPI task number.

mpi: Do not register an MPI error handler in the ss_init call.

banner=``STR``: Display the specified string value on stderr when ss_init is about to return. This is
normally used in conjuction with the config file to notify users that they should recompile their application with a newer version of sslib.
commands: Enables the ss_debug calls that might appear in applications. The `file’ term should be used
before this term in order to specify from where the debug commands should be read (don’t forget to use the `<’ in front of the file name in order to open it for read-only). If no file is specified then SSlib attempts to read the commands from the stderr stream, which may cause the commands to be read from the controlling terminal in certain situations (but it’s usually better to be explicit by providing the `file=<*dev*tty’ term). Specifying an empty file such as *dev*null has essentially the same effect as if the `commands’ term was not given.
warnings: For the selected MPI tasks, send all miscellaneous SSlib warning messages to the selected
file.
check=*what*: Turns on or off various categories of internal consistency checking, some of which incur
considerable runtime expense. The what is a comma-separated list of category names where that category of checking is turned off if introduced with a minus sign and on otherwise. Only selected tasks are affected. See the table below for a list of categories.

The following internal consistency checking categories are defined. Some categories can take a comma-separated list of attributes separated from the category name by an equal sign. When a category is followed by an equal sign then it must be the last category listed for that check term, but additional categories can be specified with additional check terms.

sync: When turned on, SSlib will check for many situations where a call to ss_pers_modified (or
the macro SS_PERS_MODIFIED) was accidently omitted by computing and caching checksums. If the error attribute is specified then such situations will be considered errors instead of just generating debugging information on the warning stream. If the bcast attribute is specified then information about which objects are transmitted will be displayed to the warning stream.
2pio: SSlib will display certain information about 2-phase I/O if this is turned on. For instance,
when aggregation tasks are chosen for a blob the mapping from dataset addresses to aggregators is displayed. The task setting doesn’t affect this flag since it’s always task zero that displays this collective information.

Return Value: Returns non-negative on success; negative on failure.

Parallel Notes: Collective across the library communicator. We do this because environment variables are sometimes only available at certain tasks (task zero of the library communicator must have the environment variable).

Example: Example 1: To start the DDD debugger on task 17:

1
 SSLIB_DEBUG='task=17;debug' ...

Example 2: To stop all tasks but task 17:

1
 SSLIB_DEBUG='task=-17;stop' ...

Example 3: To cause task 17 to report errors to a file named “task17.err” and no other task to report errors:

1
 SSLIB_DEBUG='file=none;stack;task=17;file=task17.err;stack' ...

Example 4: Cause HDF5 to emit tracing information to files like task001.trace, task002.trace, etc. The thing to watch out for here is that HDF5 gets initialized before SSlib and if file descriptor 99 is not open then tracing is disabled. So we rely on the shell to supply an initial file for descriptor 99 which SSlib will swap out from under HDF5. Until the swap occurs, all tasks will emit tracing to the shell-supplied file:

1
 SSLIB_DEBUG="file=99,task%03d.trace" HDF5_DEBUG=99,trace 99>tasks.trace ...

Example 5: Invoke a debugger on any task that fails an assertion or receives certain other normally fatal signals. Use gdb instead of the default ddd.

1
 SSLIB_DEBUG='debugger=gdb;signal' ...

Example 6: To cause each task to redirect its standard error output to its own file:

1
 SSLIB_DEBUG='file=2,stderr.%04d' ...

Example 7: To type commands interactively to SSlib one makes a call to ss_debug in the application and then uses SSLIB_DEBUG as follows:

1
 SSLIB_DEBUG='task=0;file=<commands.txt;commands' ...

Example 8: To turn off the warning/debug messages that are normally emitted from SSlib on the stderr stream one would do the following:

1
 SSLIB_DEBUG='file=/dev/null;warnings' ...

See Also: