Introduction¶
The SAF Support Library (SSlib) grew out of experience the Sets and Fields (SAF) team had with the former
Vector Bundle Tables (VBT) layer and Data Sharability Layer (DSL) and to some extent with the Hierarchical
Data Format version 5, HDF5 ( support.hdfgroup.org/HDF5/doc/index.html ) library from NCSA
. It was
decided that in order to increase performance, generalize some underlying functionality, and improve code
engineering that we would embark on an effort to rewrite most of VBT and DSL with these goals in mind:
- Reduced Communication: We learned by experience that designing an API that requires underlying communication
- makes it extremely difficult to optimize for performance at a later time, and that algorithms that require
communication can be substantially slower than those that don’t. So algorithms will be used to reduce
communication and the API will be designed so that cases of repeated communication in the old
VBT
/DSL
API can be performed just once, and cases of related communication can be combined into single messages. - Variable Length Datatypes: The VBT design set aside a fixed size character array for every string, which
- resulted in substantial wasted file space and lower bandwidth and precluded the client from using arbitrary length strings. The SSlib will employ HDF5 variable length datatypes to avoid these problems.
- Transient Objects: The original VBT specification had no provision for creating objects that exist only in
- memory, although eventually this was patched in using HDF5’s
core
virtual file driver. Transient objects are designed into SSlib. - Object Deletion: VBT did not allow for easy deletion of objects from the database. Although SSlib probably
- won’t allow individual objects to be deleted, it will allow entire scopes to be deleted, freeing up memory in the HDF5 file as provided by the HDF5 library and file format.
- Every File a Database: SAF had a notion of supplemental data files that were pointed to by a single
- master file, collectively called the database. It was not possible to open just a supplemental file, but one always had to open the master file. SSlib will make no distinction between master and supplemental files, rather every file will be a self-contained database. SAF allowed supplemental files to be missing; SSlib allows databases to be missing.
- Partial Metadata Reads: VBT always read all the object definitions from the database whenever a database
- was opened. SSlib will only read subsets of a file called “scopes” and only when those scopes are accessed and only by the tasks accessing those scopes.
- Interfile Object References: A VBT file could only refer to objects that were also in the same file. SSlib
- files will have the capability to refer to objects that are in some other file.
- Multiple References: In SSlib, two or more objects may make references to a common third object or to
- common raw data, thus reducing the required storage.
- Object Copying: Tools such as
safdiff
formerly needed extensive coding in order to copy an object (e.g., - a field) from one database to another. SSlib will provide that functionality at a much lower layer. This also simplifies the implementation of Object Registries in SAF by moving much of that functionality downward in the software stack.
- Common Error Handling: A code engineering aspect of SSlib is to generalize the HDF5 error handling
- subsystem, turn it into a public programming interface, and use it for SSlib and eventually higher software layers. This unifies the error recording and reporting features of all layers involved.
- Flexible File Decomposition: As mentioned already, SAF required all object metadata to be stored in a
- single master file with optional supplemental files to hold raw field data. SSlib relaxes that constraint
so that operational environments like
SILO
’s multi-file output are possible, where the MPI job is partitioned into smaller subsets of tasks with each subset responsible for a single database, the databases being “sewed” together later. - Reduced Code Generation: SSlib replaces the more than 12,000 lines of
vbtgen
(a table parser and C code - generator) with a few hundred lines of perl that does something very similar. In addition, the perl script parses standard C typedefs instead of a custom language.
- Better**HDF5**Coupling: The DSL datatype interface (more than 12,000 lines of library code) will be replaced
- with the HDF5 datatype interface plus a few additional functions that may migrate into the HDF5 library.
The plots below show the before and after scalability and performance improvements achieved.
Pre-optimized raw data I/O aggregate bandwidth scalability
Pre-optimized overall I/O aggregate bandwidth scalability
Optimized raw data I/O aggregate bandwidth scalability
Optimized overall I/O aggregate bandwidth scalability
Comparison of SAF and Silo Ale3d
restart file dump times
Comparison of SAF Ale3d
restart file dump times by functionality