Strings

Variable length strings as stored in persistent objects are manipulated through the SSlib string interface and use a datatype ss_string_t which is opaque to the client. This allows the implementation of persistent object strings to be changed as necessary to keep pace with functionality and performance improvements in the HDF5 string datatype.

As it turns out, HDF5 is unable to output variable length strings in parallel (1.7.3 2003-09-12). Therefore it has become necessary to change the implementation in SSlib already: all character strings for all objects of a particular scope will be stored in an extendible “Strings” dataset of type H5T_NATIVE_UCHAR in the same scope. Any object that contains a variable length string will contain an index into the “Strings” dataset, and when the object is in memory it will also contain a pointer directly to the string value. We employ an opaque HDF5 datatype to represent the string in memory and register a conversion function to allocate/find the string in the “strings” dataset during I/O. The only problem with this approach is that HDF5-level tools don’t understand that the offset is an index into the Strings dataset for a character string.

When a new task is opened all strings will initially have the same contents for the variable length string buffer, which is read by ss_string_boot. As execution progresses different tasks will add different strings to the buffer in different orders and the tasks will become out of sync. When objects of a scope are synchronized we will be guaranteed that all tasks contain a valid Strings buffer, although the order of the new values in the buffer may differ between tasks. The ss_string_flush function is responsible for choosing one of the scope tasks to write the string data back to the file.

SSlib variable length strings support uses length rather than NUL characters to mark the end of a string and are therefore capable of storing strings of bytes that might have embedded NUL characters.