Introduction

The SAF Support Library (SSlib) grew out of experience the Sets and Fields (SAF) team had with the former Vector Bundle Tables (VBT) layer and Data Sharability Layer (DSL) and to some extent with the Hierarchical Data Format version 5, HDF5 ( support.hdfgroup.org/HDF5/doc/index.html ) library from NCSA. It was decided that in order to increase performance, generalize some underlying functionality, and improve code engineering that we would embark on an effort to rewrite most of VBT and DSL with these goals in mind:

Reduced Communication: We learned by experience that designing an API that requires underlying communication
makes it extremely difficult to optimize for performance at a later time, and that algorithms that require communication can be substantially slower than those that don’t. So algorithms will be used to reduce communication and the API will be designed so that cases of repeated communication in the old VBT/DSL API can be performed just once, and cases of related communication can be combined into single messages.
Variable Length Datatypes: The VBT design set aside a fixed size character array for every string, which
resulted in substantial wasted file space and lower bandwidth and precluded the client from using arbitrary length strings. The SSlib will employ HDF5 variable length datatypes to avoid these problems.
Transient Objects: The original VBT specification had no provision for creating objects that exist only in
memory, although eventually this was patched in using HDF5’s core virtual file driver. Transient objects are designed into SSlib.
Object Deletion: VBT did not allow for easy deletion of objects from the database. Although SSlib probably
won’t allow individual objects to be deleted, it will allow entire scopes to be deleted, freeing up memory in the HDF5 file as provided by the HDF5 library and file format.
Every File a Database: SAF had a notion of supplemental data files that were pointed to by a single
master file, collectively called the database. It was not possible to open just a supplemental file, but one always had to open the master file. SSlib will make no distinction between master and supplemental files, rather every file will be a self-contained database. SAF allowed supplemental files to be missing; SSlib allows databases to be missing.
Partial Metadata Reads: VBT always read all the object definitions from the database whenever a database
was opened. SSlib will only read subsets of a file called “scopes” and only when those scopes are accessed and only by the tasks accessing those scopes.
Interfile Object References: A VBT file could only refer to objects that were also in the same file. SSlib
files will have the capability to refer to objects that are in some other file.
Multiple References: In SSlib, two or more objects may make references to a common third object or to
common raw data, thus reducing the required storage.
Object Copying: Tools such as safdiff formerly needed extensive coding in order to copy an object (e.g.,
a field) from one database to another. SSlib will provide that functionality at a much lower layer. This also simplifies the implementation of Object Registries in SAF by moving much of that functionality downward in the software stack.
Common Error Handling: A code engineering aspect of SSlib is to generalize the HDF5 error handling
subsystem, turn it into a public programming interface, and use it for SSlib and eventually higher software layers. This unifies the error recording and reporting features of all layers involved.
Flexible File Decomposition: As mentioned already, SAF required all object metadata to be stored in a
single master file with optional supplemental files to hold raw field data. SSlib relaxes that constraint so that operational environments like SILO’s multi-file output are possible, where the MPI job is partitioned into smaller subsets of tasks with each subset responsible for a single database, the databases being “sewed” together later.
Reduced Code Generation: SSlib replaces the more than 12,000 lines of vbtgen (a table parser and C code
generator) with a few hundred lines of perl that does something very similar. In addition, the perl script parses standard C typedefs instead of a custom language.
Better**HDF5**Coupling: The DSL datatype interface (more than 12,000 lines of library code) will be replaced
with the HDF5 datatype interface plus a few additional functions that may migrate into the HDF5 library.

The plots below show the before and after scalability and performance improvements achieved.

Pre-optimized raw data I/O aggregate bandwidth scalability

../../_images/plot01.jpg

Pre-optimized overall I/O aggregate bandwidth scalability

../../_images/plot02.jpg

Optimized raw data I/O aggregate bandwidth scalability

../../_images/plot03.jpg

Optimized overall I/O aggregate bandwidth scalability

../../_images/plot04.jpg

Comparison of SAF and Silo Ale3d restart file dump times

../../_images/plot05.jpg

Comparison of SAF Ale3d restart file dump times by functionality

../../_images/plot06.jpg