20.5 Relationships Between XML Metadata and Arrays
Topic Version | 2 | Published | 04/16/2018 | |
For Standard | PRODML v2.0 |
This section provides a detailed walk-through of how the relationships between the various files of the worked example are specified.
Figure 20.5-1 shows the conceptual data model and HDF5 storage for the arrays in the worked example and Figure 20.5-2 shows how this model is implemented in EPC and HDF5. Note, this is the option referred to as hybrid in Section 19.5.3 HDF5 File Array Configuration Options .
The XML files are all contained in the EPC file. The HDF5 files are transferred separately. Thus in the worked example, which shows the arrays split over 2 HDF5 files (see Figure 20.5-1 ), there are 3 files in total. See Figure 20.1-1 and the top of Figure 20.5-2 also shows these 3 files: one with extension .epc and the two HDF5 files with the .h5 extension.
Opening the EPC file (i.e., unzipping it), shows 5 XML data files: the DAS Acquisition, Instrument Box, and Optical Path (blue colored arrows) and the two EPC external part reference files (red colored arrows) (see Figure 20.2-1 and the middle of Figure 20.5-2 )).
Relationships to related files are stored in the rels folder. For each data file, there is a corresponding .rels file which stores the relationship of that file to the other files. These files have the extension .xml.rels.
The .rels files specify the relationships among ALL files—the XML files stored internally in the EPC file and the externally stored HDF5 files (see the bottom of Figure 20.5-2 ).
- Relationships to related files that are stored internally in the EPC file (such as the Instrument Box and Optical Path files in this example) are specified using an internal EPC mechanism with a TargetMode “Internal”.
- Files that are stored externally to the EPC file (such as the 2 HDF5 files in this example) are specified using an EPC mechanism called external part reference. Each external file has a corresponding external part reference XML file.
The content of these is explained below.
In the main folder of the EPC file, the DAS Acquisition XML file contains the metadata as outlined previously. In the middle of Figure 20.5-3 , extracts from this file are shown in a blue box. The UUID of the whole DAS Acquisition is an attribute of this file. The incorporation of the UUID into the file name itself, as shown, is also recommended (see the blue rectangles in the figure).
Similarly, the EpcExternalPartReference files (the red rectangle in the lower part of Figure 20.5-3 ) have their own UUID. These files act as proxies for the HDF5 files. These UUIDs are used within the DAS Acquisition XML to reference the external part (HDF5 data) concerned using the element EpcExternalPartReference within the DAS Acquisition XML. This is shown by the red rectangle in the middle of Figure 20.5-3 .
Because an EpcExternalPartReference refers to an HDF5 file which may contain multiple arrays, every array referenced from within the DAS Acquisition XML must have a specified (unique) path in the HDF5 file. This is shown by the green rectangle in the middle part of Figure 20.5-3 .
The rels folder contains one file (extension .xml.rels) per file contained in the root folder. See Figure 20.5-4 , the top snippet, light brown border.
The rels file for the DasAcquisition lists all the relationships for this file. See Figure 20.5-4 , the middle snippet, blue border. There are relationships to two files internal to the EPC (Optical Path and Instrument Box) (purple and brick red boxes), and to two external EpcExternalPartReference files (red boxes).
The rels file(s) for the EpcExternalPartReference files show a relationship back to the DasAcquisition (blue box showing its UUID), and a relationship to an external file (the .h5 file) (green box showing the file name itself). See Figure 20.5-4 , the bottom snippet, red border.
By these means, the references all tie in with each other and the specific path to an array in a specific HDF5 file can be discovered.
When arrays are split over multiple HDF5 files (as they are in the worked example), then a single logical array in the DasAcquisition XML (e.g. for a Raw array) contains 2 ExternalFileProxy elements. Figure 20.5-5 shows the same worked example and shows how the Count and StartIndex elements are used to define the partitioning of the array across the two files. The paths for both arrays are shown here. Note that count contains the number of samples in the array and StartIndex contains the number the 'scan'. To find the physical HDF5 file name, look in the rels file for the EpcExternalPartReference, as explained above.
The DasExternalFile proxy elements PartStartTime and PartEndTime show the start and end times of the partial Raw DAS data array stored in each HDF5 files.
Processed data, i.e., FBE band or spectra data, is stored and referenced the same way (Figure 20.5-6).
The above description has focused on how to navigate from the XML, via the data in the EPC, to the required arrays in the HDF5 files. The HDF5 files also contain identification data. The root level of the .h5 file contains the UUID of the EpcExternalPartReference and the Acquisition group contains the UUID of the DasAcquisition XML. . Every HDF5 file that is part of the sequence of HDF5 files has this same ID so that if an HDF5 file gets separated (e.g., a disk is misplaced); it can be associated with the right acquisition.