Parse debug setup statements¶
ss_debug_env
is a function defined in ssdebug.c.
Synopsis:
-
herr_t
ss_debug_env
(MPI_Comm UNUSED_SERIAL comm, const char *s_const)¶
Formal Arguments:
comm
: The library communicator. Pass any integer value when using a version of SSlib compiled without MPI support.s_const
: Optional string to use instead of looking at the SSLIB_DEBUG environment variable. Pass null to use SSLIB_DEBUG instead. Passing an empty string (or all white space) accomplishes nothing. Task zero broadcasts this string to all the other tasks.
Description: This function looks at the contents of the SSLIB_DEBUG environment variable. It is a semicolon separated list of terms which control various things. Valid terms are:
- task=*n*: Controls which tasks will be affected by subsequent debugging terms. A value of n with an
initial plus sign will add task n to the list of selected tasks; a leading minus sign removes the task from the list; lack of a plus or minus makes task n the only selected task. The value
all
ornone
can also be supplied which selects all tasks or no tasks, respectively. The value can also be a comma-separated list of task ranks which acts the same as if multipletask
terms had been specified (the plus or minus sign should be at the beginning of the listand applies to all values of the list).
- error=*n*: When errors are pushed onto the error stack they are each given a unique (within a task)
- identification number. When this
error
debugging term is specified then the debugger of choice is invoked when error number n is pushed onto the stack. If the equal sign and error number are omitted then the debugger is never started for an error, but the error stack will display the error identification numbers. Only one error number can be specified per task–if more flexibility is needed then the application can be run under a debugger with a breakpoing set in ss_error. - file=*name*: Selects the output file to use for subsequent debugging terms for the selected tasks. The
- file will be created if it doesn’t exist or truncated if it does exist. The name is actually
a
printf
format string and the first format specifier (if present) should be for an integer task number. If name is the wordnone
then output is disabled; if name is a positive integer then the specified file descriptor is used without attempting to open it (this is useful if the descriptor was opened with the shell). If a number and name are both specified separated by a comma then the name is opened and dup’d to the desired file descriptor. If the name begins with a `<’ character then the file is opened for read-only. - stop: The specified MPI task(s) will print their MPI rank and process
ID
and then suspend themselves, - giving an opportunity for a debugger to attach.
- pause=``N``: The specified MPI task(s) will immediately pause for
N
seconds. This is useful - when a task needs to give a debugger (such as strace) to automatically attach to child processes.
debugger=*name*: Specifies which debugger should be used. The default is `ddd’.
- debug: The specified debugger (or ddd) is started for the affected task or tasks.
- This probably only works on systems that have a
/proc/self/exe
link to the executable and theDISPLAY
environment variable set properly for the affected task. If the non-default debugger is desired then the `debugger’ keyword must appear before this `debug’ keyword. - signal: Start the debugger when a task is about to die from certain signals (those that signify
- a program error). The task is suspended (although other signal handlers might still be executed) and must be explicitly killed. The `debug’ keyword takes precedence over `signal’.
- stack: Turn automatic error reporting on or off for selected tasks depending on the current setting
- for the file descriptor. When off, errors are reported by return values as usual and the error stack contains information about the error, but the stack is not automatically printed. The default is that errors are printed to stderr.
- pid: Print the process
ID
for all selected tasks. This is useful when various tools (such as - valgrind) print PIDs but have no way of knowing the MPI task number.
mpi: Do not register an MPI error handler in the ss_init call.
- banner=``STR``: Display the specified string value on stderr when ss_init is about to return. This is
- normally used in conjuction with the config file to notify users that they should recompile their application with a newer version of sslib.
- commands: Enables the ss_debug calls that might appear in applications. The `file’ term should be used
- before this term in order to specify from where the debug commands should be read (don’t forget to use the `<’ in front of the file name in order to open it for read-only). If no file is specified then SSlib attempts to read the commands from the stderr stream, which may cause the commands to be read from the controlling terminal in certain situations (but it’s usually better to be explicit by providing the `file=<*dev*tty’ term). Specifying an empty file such as *dev*null has essentially the same effect as if the `commands’ term was not given.
- warnings: For the selected MPI tasks, send all miscellaneous SSlib warning messages to the selected
- file.
- check=*what*: Turns on or off various categories of internal consistency checking, some of which incur
- considerable runtime expense. The what is a comma-separated list of category names where that category of checking is turned off if introduced with a minus sign and on otherwise. Only selected tasks are affected. See the table below for a list of categories.
The following internal consistency checking categories are defined. Some categories can take a comma-separated
list of attributes separated from the category name by an equal sign. When a category is followed by an equal
sign then it must be the last category listed for that check
term, but additional categories can be specified
with additional check
terms.
- sync: When turned on, SSlib will check for many situations where a call to ss_pers_modified (or
- the macro SS_PERS_MODIFIED) was accidently omitted by computing and caching checksums. If
the
error
attribute is specified then such situations will be considered errors instead of just generating debugging information on the warning stream. If thebcast
attribute is specified then information about which objects are transmitted will be displayed to the warning stream. - 2pio: SSlib will display certain information about 2-phase I/O if this is turned on. For instance,
- when aggregation tasks are chosen for a blob the mapping from dataset addresses to aggregators
is displayed. The
task
setting doesn’t affect this flag since it’s always task zero that displays this collective information.
Return Value: Returns non-negative on success; negative on failure.
Parallel Notes: Collective across the library communicator. We do this because environment variables are sometimes only available at certain tasks (task zero of the library communicator must have the environment variable).
Example: Example 1: To start the DDD
debugger on task 17:
1 | SSLIB_DEBUG='task=17;debug' ...
|
Example 2: To stop all tasks but task 17:
1 | SSLIB_DEBUG='task=-17;stop' ...
|
Example 3: To cause task 17 to report errors to a file named “task17
.err” and no other task to report errors:
1 | SSLIB_DEBUG='file=none;stack;task=17;file=task17.err;stack' ...
|
Example 4: Cause HDF5 to emit tracing information to files like task001.trace,
task002.trace,
etc.
The thing to watch out for here is that HDF5 gets initialized before SSlib and if file descriptor 99 is not
open then tracing is disabled. So we rely on the shell to supply an initial file for descriptor 99 which SSlib
will swap out from under HDF5. Until the swap occurs, all tasks will emit tracing to the shell-supplied file:
1 | SSLIB_DEBUG="file=99,task%03d.trace" HDF5_DEBUG=99,trace 99>tasks.trace ...
|
Example 5: Invoke a debugger on any task that fails an assertion or receives certain other normally fatal
signals. Use gdb
instead of the default ddd.
1 | SSLIB_DEBUG='debugger=gdb;signal' ...
|
Example 6: To cause each task to redirect its standard error output to its own file:
1 | SSLIB_DEBUG='file=2,stderr.%04d' ...
|
Example 7: To type commands interactively to SSlib one makes a call to ss_debug in the application and then uses SSLIB_DEBUG as follows:
1 | SSLIB_DEBUG='task=0;file=<commands.txt;commands' ...
|
Example 8: To turn off the warning/debug messages that are normally emitted from SSlib on the stderr stream one would do the following:
1 | SSLIB_DEBUG='file=/dev/null;warnings' ...
|
See Also:
- SS_PERS_MODIFIED: 7.29: Mark object as modified
- ss_debug: 22.1: Enter an interactive debugging loop
- ss_error: 2.5: Start debugger for error
- ss_init: 2.8: Initialize the library
- ss_pers_modified: 7.19: Mark object as modified
- Debugging: Introduction for current chapter