HDF5 Files and EPC External Part Reference

Topic Version1Published09/11/2015
For StandardRESQML v2.0.1

In some cases, it may be desirable to store parts of an Energistics package externally to the EPC file; the most common example is HDF5 files, which store all of the explicit array data in RESQML.

Hierarchical Data Format (HDF) is a data model, library, and file format for storing and managing data. It supports an unlimited variety of data types, and is designed for flexible and efficient I/O and for both high volume and complex data—particularly when compared to XML. HDF version 5 is part of the Energistics Common Technical Architecture and is used in RESQMLV2+ and in other Energistics standards. (For more information on how HDF5 is used in RESQML, see 6.2.2 Multi-Dimensional Arrays and HDF5 Data Storage and Chapter 18 Appendix: HDF5 Implementation Overview .)

The EPC file is designed for data streaming, and in some implementations, has limitations in the amount the data which may be included. In contrast, an HDF5 file is designed for large data files, random access (not streaming), and can already compress its data sets. As a consequence, RESQML requires that HDF5 files be stored outside the EPC file. To accurately maintain all relationships, the EPC file requires use of an external reference to the HDF5 file.

The following items describe how to store and reference HDF files in the context of an EPC file and RESQML:

  • HDF5 files must be stored outside the EPC file.
  • When HDF5 files are used with Energistics standards they must use the file extension: .h5
  • It is recommended that the HDF5 files are stored in the same location as the EPC file.
  • Each HDF5 file must have an EPC external part reference, which is a proxy that points to the HDF5 file. This reference must be stored inside the EPC file.
  • The HDF5 external part reference must have a relationship file that defines the actual location of the physical HDF5 file (which is recommended to be the same location as the EPC file).
  • The corresponding entry in the relationship file must have a type attribute set to: http://schemas.energistics.org/package/2012/relationships/externalResource
  • As an XML data object, an external part reference must have a UUID, which must be included as an attribute of the physical HDF5 file to allow cross validation.
  • The format of the UUID in the HDF5 file must be of data type RESQML_c_s1 (from HDF5 documentation) in its canonical format (lower case letters only).
  • Multiple HDF files can be used to describe the array data of multiple Energistics parts. Although not recommended, one XML data object can reference multiple EPC external part references. However one EPC external part reference can be associated with only one HDF5 file.
  • The XML data object must reference the HDF external part reference.
  • The corresponding entry in the relationship file must have a type attribute set to: http://schemas.energistics.org/package/2012/relationships/mlToExternalPartProxy
  • The corresponding backward entry (in the rel file of the proxy) must have a type attribute set to: http://schemas.energistics.org/package/2012/relationships/externalPartProxyToMl