I/O

The functions of this programming interface deal with I/O to and from an HDF5 file. These are mostly for non-table data such as application data for fields and relations. SSlib needs to support the following goals:

Targeting: Data can be targeted for a specific architecture during write operations that create a new
dataset. This is useful when data is created on one architecture and will be read repeatedly on another architecture, allowing the data conversion price to be paid just once.
Precision: The data precision can be changed during read/write operations by requesting that the size
of datatypes in the file are different than those in memory. For example, when writing a plot file the caller might supply double values but desire them to be written as float values.
Task-Aggregation: When many tasks are contributing non-overlapping data for a single HDF5 dataset
then it may be advantageous to do some message passing in order to aggregate the data to a smaller subset of tasks where each one can contribute a larger aligned block of data to the file system.
Field-Aggregation: Distinct fields and/or relations might want to share a single dataset in a
non-overlapping manner in order to improve I/O performance. It should be up to the application how to organized the data in the dataset.
Sharing: Two or more fields or relations should be able to point to a common dataset if those fields or
relations truly reference common data. Changing the data for one field or relation will also change the data for the other fields or relations.
Cross-file: We should be able to store raw data in an SSlib database other than the one holding the
relation or field.
Prewritten: It should be possible to point to field or relation data that was already written to a file by
the client.

In order to get all this to work, SSlib relies heavily on HDF5 support and therefore exposes the HDF5 API to the SSlib client. This allows the client to make full use of HDF5 capabilities, but in many cases the client would rather just let SSlib take care of all the storage details. These two competing design goals are handled by SSlib blob persistent objects (not to be confused with the old VBT blobs which served a similar but much simpler purpose).

A blob points to either a buffer in memory or part of a dataset in a file or both. When pointing to a dataset, the dataset must always be in the same file as the blob itself. An object such as a field in one SSlib file can store raw data in some other SSlib file by linking to a blob defined in that second file. All blob datasets have names in the blob storage group of the top-level scope, and the names are the decimal representation of the dataset object header address. This accomplishes three goals: (1) any blob dataset can be referred to with a single haddr_t value, (2) unique dataset names can be created with no communication, and (3) all blob datasets can be discovered with just a couple HDF5 calls.

SSlib allows blobs to share datasets and the shared dataset. The dimensionality of a blob may be less than the dimensionality of the dataset in which it lives allowing, for instance, one-dimensional blobs to be overlayed as rows of a two-dimensional dataset. See ss_blob_bind_f and ss_blob_space for details.

Since dataset creation and opening in HDF5 is an operation that is collective across the file communicator, many blob operations are also collective across that communicator.

Blobs cannot be associated with a transient scope since there is no underlying HDF5 file in which to store the raw data.