6.2.2 Multi-Dimensional Arrays and HDF5 Data Storage
Topic Version | 1 | Published | 09/11/2015 | |
For Standard | RESQML v2.0.1 |
Each element of a multi-dimensional array within a representation must have a well-defined 1D index, to allow elements to be uniquely referenced for properties, geometry, data storage or any other purposes, e.g., subrepresentations. The data ordering is uniquely specified at the representation level and this ordering is inherited by the array constructions for points and geometry.
For example:
- For a 2D array (N1 x N2) with indices I1=0,…,N1-1 and I2=0,…,N2-1, then the 1D index is I1+N1*I2.
- For a 3D array (N1 x N2 x N3) with indices I1=0,…,N1-1, I2=0,…,N2-1 and I3=0,…,N3-1, then the 1D index is I1+N1*I2+N1*N2*I3.
This ordering choice is sometimes called “fastest to slowest”, with the first index in the equation varying the fastest, and the last index varying the slowest. RESQML is not restricted to 3D arrays, for example, the “faces per cell” on an IJK grid follow a 4D (6 x NI x NJ x NK) array indexing.
HDF5 Data Storage. When stored in HDF5, the data storage order is the RESQML index order. It is very important to understand this relationship between indexing and data storage within RESQML. Index order for the elements within a representation is specified by the schema documentation, and this data order is preserved in the HDF5 data storage. However, because of how the HDF5 array storage works, this means that an N1 x N2 RESQML array is stored as a N2*N1 HDF5 array (N1 fastest, N2 slowest). To avoid confusion, use of the words “first” and “last” needs to clearly distinguish between the RESQML index calculation and the HDF5 data storage context. The important point: when viewed as an equivalent 1D array, the HDF5 data storage ordering and the RESQML index ordering are identical.
NOTE: For a brief introduction on implementing HDF5 in RESQML, see 18 Appendix: HDF5 Implementation Overview .
Lattice Offsets. Geometry and properties use multi-dimensional lattice offset constructions for points and values, respectively. The ordering of the offsets follows the ordering of the indices in the multi-dimensional index calculation and hence is opposite to the ordering of the HDF5 data storage.
An example of the use of multi-dimensional arrays. The coordinate line nodes on a faulted grid, where N1=coordinateLineCount and N2=NKL. However, the dimensionality of an array may vary with context; for example, the coordinate lines themselves may be either a 1D or a 2D array. In the special case of an unfaulted grid, the coordinate lines are a 2D array indexed by NIL x NJL, and the coordinate line nodes are a 3D array indexed by NIL x NJL x NKL.
Lists. When points or multi-dimensional (count>1) property values are stored in HDF5, this introduces an additional dimension, which is always the fastest. For example:
- An N dimensional array of points3d is stored as an N*3 HDF5 array of coordinates.
- Alternatively, an N dimensional array of points2d, is stored as an N*2 HDF5 array of coordinates.
- An array of facies proportion curves, (count>1), is stored as an N*count HDF5 array of values.
BUSINESS RULE: To facilitate data validation and hyper-slabbing of the data, RESQML requires that data be stored with the maximum dimensionality possible. For the example of coordinate line nodes given earlier, this rule implies that instead of always using a 2D array format, which is possible, that a 3D array format is used for an unfaulted grid.