Read data from a file

ss_blob_read is a function defined in ssblob.c.

Synopsis:

void * ss_blob_read(ss_blob_t *blob, hid_t iospace, unsigned flags, ss_prop_t UNUSED *props)

Formal Arguments:

  • blob: The blob from which data should be read. This is the blob’s top scope for tasks that are participating for collectivity and are members of the blob’s file communicator but not the blob’s scope communicator.
  • iospace: This is an optional hyperslab describing the part of the blob that is to be read. The extent and selection are relative to the portion of the dataset owned by the blob and described in some previous call to ss_blob_bind_f (perhaps in an earlier execution). If not specified then all of the blob’s data is read. Such a selection is generally constructed by calling ss_blob_space and applying the selection.
  • flags: Various bit flags commonly passed to this function.
  • props: See *Blob Properties*. (Unused at this time.)

Description: Given a blob which is bound to a dataset, read the desired portion of the blob as described by the iospace argument and return a pointer to the buffer into which the data has been placed (which is either the buffer bound to the dataset with ss_blob_bind_m or a buffer allocated by this function).

The flags argument determines specifics about the read operation and the following bits are defined:

SS_BLOB_COLLECTIVE: The operation is to be considered collective across the blob’s file communicator.
SSlib can use two-phase I/O for this situation. If this bit is not set then the operation is considered independent of any other task.
SS_BLOB_ASYNC: The I/O for this call can be performed asynchronously, allowing SSlib to use two-phase
I/O even when the call is independent. Asynchronous reads are guaranteed to be completed after a call to ss_blob_flush on the affected dataset.
SS_BLOB_UNBIND: The memory and blob are disassociated from each other when this function returns,
whether the return status indicates success or failure. However, if the failure occurs early enough (e.g., the blob is invalid) then no disassociation will occur.

If the blob is not bound to memory then a buffer is allocated by this function but is not bound to the blob. The returned datatype is a native type based on the dataset type and computed with H5Dget_native_type. Otherwise the memory and dataset datatypes must be conversion compatible.

The data spaces of the memory, the blob, the dataset, and the iospace must all be compatible. SSlib allows the dataset dimensionality to be larger than the blob dimensionality, but the memory, blob, and iospace data spaces must all be the same dimensionality.

Return Value: Returns a pointer to memory containing the result data. If the blob was bound to memory then this is the same pointer that would be returned with a call to ss_blob_bound_m, otherwise this is memory that was allocated by SSlib and should be freed by the caller. Returns the null pointer on failure.

For asynchronous operations there is currently no good way to determine whether this particular read was successful, only whether the entire flush operation was successful.

Parallel Notes: Independent unless the SS_BLOB_COLLECTIVE bit is turned on in the flags argument, in which case the function should be called collectively across all tasks in the file communicator to which the blob belongs. The tasks that are part of the file communicator but not part of the blob’s scope communicator should pass the top scope of the blob’s file as the blob argument.

The order of reads and writes is indeterminate when SSlib is doing asynchronous I/O and it is up to the caller to issue the appropriate ss_blob_flush calls to ensure an ordering.

Example: These examples are all one-dimensional for simplicity, and therefore a real application would probably use the one-dimensional versions of most of these functions. Their names are the same except a 1’ is appended; their arguments are obviously different. See :ref:`ss_blob_read1 <ss_blob_read1> for examples.

Example 1: A single task reads all of the blob’s data into a static buffer. We assume that the dataset contains 100 elements of an integer datatype. SSlib will convert the data from the file datatype to an int type in memory.

1
2
3
4
5
6
 ss_blob_t b = SS_RELATION(rel)->d_blob;
 int data[100];
 hsize_t size = 100;
 hid_t mspace = H5Screate_simple(1, &size, NULL); // 100 contiguous elements in memory
 ss_blob_bind_m(&b, data, H5T_NATIVE_INT, mspace); // bind buffer to the blob
 ss_blob_read(&b, H5S_ALL, SS_BLOB_UNBIND, NULL); // read data into the buffer

Example 2: All tasks read all data collectively (they could also do it independently as when the above example is executed by every task, but that could be very inefficient since SSlib and lower layers cannot recognize that collective optimizations are possible).

1
2
3
4
5
6
 ss_blob_t b = SS_RELATION(rel)->d_blob;
 int data[100];
 hsize_t size = 100;
 hid_t mspace = H5Screate_simple(1, &size, NULL); // 100 contiguous elements in memory
 ss_blob_bind_m(&b, data, H5T_NATIVE_INT, mspace); // bind buffer to the blob
 ss_blob_read(&b, H5S_ALL, SS_BLOB_COLLECTIVE|SS_BLOB_UNBIND, NULL); // read data into the buffer

Example 3: Each task reads 50 non-overlapping task-rank-order elements from a blob that was associated with a relation. Each task provides a buffer for the result. We assume that the blob’s data is one dimensional, a floating-point datatype, and of sufficient size to satisfy the read request.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
 int self = ...;                  // task rank in blob's file communicator
 float buffer[50];                // the result buffer
 hsize_t start = 50 * self;       // starting offset relative to blob's data
 hsize_t size = 50;               // number of consecutive elements to read
 hid_t bspace;                    // blob's data space
 hid_t mspace = H5Screate_simple(1, &size, NULL); // describe memory to HDF5
 ss_blob_t *blob = SS_RELATION_P(rel, d_blob); // beware: blob pointer is temporary
 ss_blob_bind_m(blob, buffer, H5T_NATIVE_FLOAT, mspace); // bind buffer to the dataset
 ss_blob_bound_f(blob, NULL, NULL, NULL, &bspace); // get the blob's data space
 H5Sselect_hyperslab(bspace, H5S_SELECT_SET, &start, NULL, &size, NULL); // describe partial read
 ss_blob_read(blob, bspace, SS_BLOB_COLLECTIVE, NULL);
 ss_blob_bind_m(blob, NULL, 0, 0); // unbind memory from blob (could have used SS_BLOB_UNBIND)
 H5Sclose(bspace);
 H5Sclose(mspace);

Issues: For a collective call where all tasks read the same selection of the dataset and all desire the same datatype and all destinations are contiguous in memory, SSlib may perform an independent H5Dread and then broadcast the data to the other tasks. This optimization should eventually be moved into HDF5.

The two-phase I/O optimization for reads is not implemented.

See Also: