3.2.3 EPC External Part References and HDF5 Files

Topic Version2Published11/11/2016
For StandardEPC v1.0

In some cases, it may be desirable to store parts of an EPC file externally to the EPC file; the most common example is HDF5 files, which store large explicit arrays and numerical data in Energistics domain standards.

Hierarchical Data Format (HDF) is a data model, library, and file format for storing and managing data. It supports an unlimited variety of data types, and is designed for flexible and efficient I/O and for both high volume and complex data—particularly when compared to XML. HDF version 5 is part of the Energistics Common Technical Architecture and is used in RESQMLv2+ and with other Energistics standards.

The Energistics package is designed for data streaming, and in some implementations, has limitations in the amount of data which may be included. In contrast, an HDF5 file is designed for large data files, random access (not streaming), and can already compress its data sets. As a consequence, some Energistics standards (e.g., RESQML) require that HDF5 files be stored outside the Energistics package. To accurately maintain all relationships, the package requires use of an external reference to the HDF5 file.

The following items describe how to store and reference HDF files in the context of an EPC file.

  • HDF5 files may be stored inside or outside the EPC file.
  • When HDF5 files are used with Energistics standards they shall use the file extension: .h5
  • If stored inside the EPC file, the mime type for an HDF5 file is: application/x-hdf5
  • If stored outside, it is recommended that the HDF5 files are stored in the same location as the EPC file.
  • If stored outside, each HDF5 file must have an EPC external part reference, which is a proxy that points to the HDF5 file. This reference must be stored inside the EPC file.
  • Each HDF5 EpcExternalPartReference must have a relationship file that contains an entry that specifies the actual location of its physical HDF5 file with the attribute “Target” set to the file name.
  • The corresponding entry in the relationship file must have a “type” attribute set to: http://schemas.energistics.org/package/2012/relationships/externalResource and the target mode set to “External”.
  • RECOMMENDATION: Set the “Id” attribute to “Hdf5File”.
  • As a top-level XML data object, an external part reference must have a UUID, which must be included as an attribute of the physical HDF5 file called "uuid" to allow cross validation. This attribute should be stored at the root level of the HDF5 file.
  • The data type of the UUID in the HDF5 file must be a single string of 36 characters. In the C version of the HDF5 library this data type is H5T_C_S1. The UUID must be stored in its canonical format (lower case letters only) including the 4 dashes.
  • The EpcExternalPartReference attribute “Filename” and “MimeType” should be null. These attributes are used for non-EPC transfers. A reader must use the Filename and MimeType from the relationship file.
  • You may have an array that is so large that it will not fit on a physical disk; therefore, a single array may reference multiple HDF5 files. A single ExternalDataset could therefore contain multiple ExternalDatasetParts, each of which references a single EpcExternalPartReference. However, each EpcExternalPartReference can be associated with only one HDF5 file.