20.5 Relationships Between XML Metadata and Arrays

Topic Version2Published04/16/2018
For StandardPRODML v2.0

This section provides a detailed walk-through of how the relationships between the various files of the worked example are specified.

Figure 20.5-1 shows the conceptual data model and HDF5 storage for the arrays in the worked example and Figure 20.5-2 shows how this model is implemented in EPC and HDF5. Note, this is the option referred to as hybrid in Section 19.5.3 HDF5 File Array Configuration Options .

Figure 20.5-1 DAS worked example conceptual model with HDF5, which is a “hybrid” implementation (from the list in Section 19.5 ).

The XML files are all contained in the EPC file. The HDF5 files are transferred separately. Thus in the worked example, which shows the arrays split over 2 HDF5 files (see Figure 20.5-1 ), there are 3 files in total. See Figure 20.1-1 and the top of Figure 20.5-2 also shows these 3 files: one with extension .epc and the two HDF5 files with the .h5 extension.

Opening the EPC file (i.e., unzipping it), shows 5 XML data files: the DAS Acquisition, Instrument Box, and Optical Path (blue colored arrows) and the two EPC external part reference files (red colored arrows) (see Figure 20.2-1 and the middle of Figure 20.5-2 )).

Relationships to related files are stored in the rels folder. For each data file, there is a corresponding .rels file which stores the relationship of that file to the other files. These files have the extension .xml.rels.

The .rels files specify the relationships among ALL files—the XML files stored internally in the EPC file and the externally stored HDF5 files (see the bottom of Figure 20.5-2 ).

  • Relationships to related files that are stored internally in the EPC file (such as the Instrument Box and Optical Path files in this example) are specified using an internal EPC mechanism with a TargetMode “Internal”.
  • Files that are stored externally to the EPC file (such as the 2 HDF5 files in this example) are specified using an EPC mechanism called external part reference. Each external file has a corresponding external part reference XML file.

The content of these is explained below.

Figure 20.5-2 The 3 files comprising the worked example (top); the content of the EPC file (middle)—5 XML files; and the content of the Rels folder within the EPC file—one relationship file per XML data object (bottom).

In the main folder of the EPC file, the DAS Acquisition XML file contains the metadata as outlined previously. In the middle of Figure 20.5-3 , extracts from this file are shown in a blue box. The UUID of the whole DAS Acquisition is an attribute of this file. The incorporation of the UUID into the file name itself, as shown, is also recommended (see the blue rectangles in the figure).

Similarly, the EpcExternalPartReference files (the red rectangle in the lower part of Figure 20.5-3 ) have their own UUID. These files act as proxies for the HDF5 files. These UUIDs are used within the DAS Acquisition XML to reference the external part (HDF5 data) concerned using the element EpcExternalPartReference within the DAS Acquisition XML. This is shown by the red rectangle in the middle of Figure 20.5-3 .

Because an EpcExternalPartReference refers to an HDF5 file which may contain multiple arrays, every array referenced from within the DAS Acquisition XML must have a specified (unique) path in the HDF5 file. This is shown by the green rectangle in the middle part of Figure 20.5-3 .

Figure 20.5-3 The 5 files in the EPC root folder (DAS Acquisition, DAS Instrument Box, Optical Path and two EpcExternalPartReference) (top); the DAS acquisition file showing an example of a reference to a single HDF5 array with the path to the H5 file (green) and UUID of EpcExternalPartReference (red) (middle); and how this UUID is located in the EpcExternalPartReference XML itself (bottom).

The rels folder contains one file (extension .xml.rels) per file contained in the root folder. See Figure 20.5-4 , the top snippet, light brown border.

The rels file for the DasAcquisition lists all the relationships for this file. See Figure 20.5-4 , the middle snippet, blue border. There are relationships to two files internal to the EPC (Optical Path and Instrument Box) (purple and brick red boxes), and to two external EpcExternalPartReference files (red boxes).

The rels file(s) for the EpcExternalPartReference files show a relationship back to the DasAcquisition (blue box showing its UUID), and a relationship to an external file (the .h5 file) (green box showing the file name itself). See Figure 20.5-4 , the bottom snippet, red border.

By these means, the references all tie in with each other and the specific path to an array in a specific HDF5 file can be discovered.

Figure 20.5-4 Showing the 5 files in the EPC rels folder (one for each data file comprising the set) (top); the relationships existing from the DASAcquisition with the two internal XML files, plus two EpcExternalPartReferences (middle); and the relationships existing from the EpcExternalPartReference to its parent DAS acquisition and to the physical H5 file (bottom).

When arrays are split over multiple HDF5 files (as they are in the worked example), then a single logical array in the DasAcquisition XML (e.g. for a Raw array) contains 2 ExternalFileProxy elements. Figure 20.5-5 shows the same worked example and shows how the Count and StartIndex elements are used to define the partitioning of the array across the two files. The paths for both arrays are shown here. Note that count contains the number of samples in the array and StartIndex contains the number the 'scan'. To find the physical HDF5 file name, look in the rels file for the EpcExternalPartReference, as explained above.

The DasExternalFile proxy elements PartStartTime and PartEndTime show the start and end times of the partial Raw DAS data array stored in each HDF5 files.

Figure 20.5-5 Shows how a single array (raw data in this case) is split across two physical files per the worked example and the figures above. The files are identified by green bubble/arrow/label (1st file, bottom left) and by a purple bubble/arrow/label (2nd file, bottom right).

Processed data, i.e., FBE band or spectra data, is stored and referenced the same way (Figure 20.5-6).

Figure 20.5-6 Spectra processed data referenced from the XML. In this case, each FBE band has a unique array name in the HDF5 file, and these can be seen in the PathInExternalFile elements in the snippets.

The above description has focused on how to navigate from the XML, via the data in the EPC, to the required arrays in the HDF5 files. The HDF5 files also contain identification data. The root level of the .h5 file contains the UUID of the EpcExternalPartReference and the Acquisition group contains the UUID of the DasAcquisition XML. . Every HDF5 file that is part of the sequence of HDF5 files has this same ID so that if an HDF5 file gets separated (e.g., a disk is misplaced); it can be associated with the right acquisition.

Figure 20.5-7 DAS worked example XML (displayed in text editor) and related HDF5 files (displayed with an HDF5 utility). Showing where EpcExternalPartReference is used to reference to the .h5 files. The .h5 files contain the uuid ID which is the UUID of the DasAcquisition XML, allowing a reference back to the XML that describes the whole acquisition.