Introduction¶

This is the Sets and Fields (SAF pronounced “safe”) Application Programming Interface (API) programmer’s reference manual. This manual is organized into Chapters where each chapter covers a different, top-level, set of functions (e.g. object and its supporting methods) SAF supports.

There is a decent introduction to the SAF data model in this paper,: github.com/markcmiller86/SAF/blob/master/src/safapi/docs/miller001.pdf
Various API design ideas were taken from this paper,: github.com/markcmiller86/SAF/blob/master/src/safapi/docs/necdc_2004_paper_30Nov04.pdf

SAF is designed first and foremost to support scalable I/O of shareable, scientific data.

The key words in this statement are scalable and shareable.

Scalable means that SAF is designed to operate with high performance from single processor, workstation class machines, to large scale, parallel computing platforms such as are in use in the ASCI program. In turn, this also demands that SAF be portable across a variety of computing platforms. Currently, SAF operates in serial and parallel on Dec, Sun, Linux, IBM-SP2, Intel TeraFlops, SGI-O2k (single box). SAF is also supported in serial on Windows. A good measure of SAF’s performance and portability is derived from its use of industry standard software components such as HDF5 ( support.hdfgroup.org/HDF5/doc/index.html ) and MPI ( www.mpi-forum.org ). However, scalable I/O is just one of SAF’s primary goals. Making data shareable is another.

Shareable means that if one application uses SAF to write its data, other wholly independent applications can easily read and interpret that data. Of course, it is not all that impressive if one application can simply read a bunch of bytes that another has written. Thus, the key to understanding what shareable means is the and interpret part. SAF is designed to make it easy for one scientific computing application to interpret another’s data. Even more so, SAF is designed to enable this interpretation across a diverse and continually expanding gamut of scientific computing applications. In a nutshell, SAF lays the foundation for very large scale integration of scientific software.

The organizations involved in the development of SAF have plenty of experience with integration on smaller scales with products like netCDF, HDF, PATRAN, SEACAS, Silo and Exodus II. These technologies offer applications a menu of objects; some data structures (e.g. array, list, tree) and/or some mesh objects (e.g. structured-mesh, ucd-mesh, side-sets, etc.). For application developers who use these products, the act of sharing their data is one of browsing the menu. If they are lucky, they will find an object that matches their data and use it. If they are unlucky, they will have to modify their data to put it into a form that matches one of the objects on the menu.

Thus, former approaches to making shareable data suffer from either requiring all clients to use the same data structures and/or objects to represent their data or by resulting in an ever expanding set of incrementally different data structures and/or objects to support each client’s slightly different needs. The result is that these products can and have been highly successful within a small group of applications who either…

buy into the small menu of objects they do support, or
don’t require support for very many new objects (e.g. changes to the supporting library), or
don’t expect very many other applications to understand their data

In other words, previous approaches have succeeded in integration on the small scale but hold little promise for integration on the large scale.

The key to integration and sharing of data on the large scale is to find a small set of primitive, yet mathematically meaningful, building blocks out of which descriptions for many different kinds of scientific data can be constructed. In this approach, each new and slightly different kind of data requires the application of the same building blocks to form a slightly different assembly. Since every assembly is just a different application of the same building blocks, each is fully supported by existing software. In fact, every assembly of building blocks is simply a model for an instance of some scientific data. This is precisely how SAF is designed to operate. For application developers using SAF, the act of sharing their data is one of literally modeling their data; not browsing a menu. This modeling is analogous to the user of a CAD/CAM tool when applying constructive solid geometry (CSG) primitives to build an engineering model for some physical part. In a nutshell, the act of sharing data with SAF is one of scientific data modeling.

This requires a revolution in the way scientific computing application developers think about their data. The details of bits and bytes, arrays and lists are pushed to the background. These concepts are still essential but less so than the modeling primitives used to characterize scientific data. These modeling primitives are firmly rooted in the mathematics underlying most, if not all, scientific computing applications. By and large, this means the model primitives will embody the mathematical and physical notions of fields defined on base - spaces or sets.

The term field is used to describe any phenomenon that can be mathematically represented, at least locally, as a function over some, often continuous, base-space or domain. The term base - space is used to describe an infinite point set, often continuous, with a topological dimension over which fields are defined. Thus, SAF provides three key modeling primitives; fields, sets, and relations between these entities. Fields may represent real physical phenomena such as pressure, stress and velocity. Fields may be related to other fields by integral, derivative or algebraic equations. Fields are defined on sets. Sets may represent real physical objects such parts in an assembly, materials and slide interfaces. And, sets may be related to other sets by set-algebraic equations involving union, intersection and difference.

A full description of modeling principles upon which SAF is based is outside this scope of this programmer’s reference manual. User quality tutorials of this material will be forthcoming as SAF evolves. However, the reader should pause for a moment and confirm in his own mind just how general the notions of field and set are in describing scientific data. The columns of an Excel spreadsheet are fields. A time history is a field. The coordinates of a mesh is a field. A plot dump is a whole bunch of related fields. An image is a field. A video is a field. A load curve is a field. Likewise for sets. An individual node or zone is a set. A processor domain is a set. An element block is a set. A slide line or surface is a set. A part in an assembly is a set. And so on.

Understanding and applying set, field and relation primitives to model scientific data represents a revolutionary departure from previous, menu based approaches. SAF represents a first cut at a portable, parallel, high performance application programming interface for modeling scientific data. Over the course of development of SAF, the organizations involved have seen the value in applying this technology in several directions…

A publish/subscribe scenario for exchanging data between scientific computing clients, in-situ.
End-user tools for performing set operations and restricting fields to subsets of the base space to take a closer look at portions of tera-scale data.
Operators which transform data during exchange between clients such as changing the processor decomposition, evaluation method, node-order over elements, units, precision, etc. on a field.
Data consistency checkers which confirm a given bunch of scientific data does indeed conform to the mathematical and physical description that has been ascribed to it by its model. For example, that a volume or mass fraction field is indeed between 0.0 and 1.0, everywhere in its base-space.
MPI-like parallel communication routines pitched in terms of sets and fields rather than data structures.

And many others.

While each of these areas shows promise, our first goal has been to demonstrate that we can apply this technology to do the same job we previously achieved with mesh-object I/O libraries like Silo and Exodus II. In other words, our first and foremost goal is to demonstrate that we can read and write shareable scientific data files with good performance. Such a capability is fundamental to the success of any organization involved in scientific computing. If we cannot demonstrate that, there is little point in trying to address these other areas of interest.