18.4.1 Enabling Compression

Topic Version1Published09/11/2015
For StandardRESQML v2.0.1

RESQML is designed to store large quantities of data, and HDF5 helps with this by compressing the data as it is saved to a file.

HDF5’s implementation of compression provides the writer with a trade-off between compression efficiency and access speed for readers. It does this by splitting the data into fixed-sized “chunks” and compressing each chunk individually.

The larger a chunk is, the more efficient the compression will be. However, this efficient compression is at the cost of reading efficiency, because HDF5 needs to read and decompress into memory larger parts of the file to get at a particular part of the data.

Smaller chunks improve the reading efficiency of HDF5, because less data must be read and decompressed into memory to get at a particular part of the data. The downside to this is that the compression is less efficient and resulting file sizes will be larger.

When determining a chunk size to use, consider how the data will be accessed. For example, a likely use case for applications reading explicit grids is to extract only particular layers out of the data using hyperslabbing. A chunking setup that supports this scenario is to make a chunk for each layer. This approach makes accessing the data a layer at a time maximally efficient, while still giving HDF5 a large enough block of data to compress.

Enabling compression involves creating an HDF5 Parameter that is used when creating a Dataset. This parameter contains the size of the chunk and the compression properties.

hid_t DatasetParameterID = H5Pcreate(H5P_DATASET_CREATE);if (DatasetParameterID > 0){herr_t e = H5Pset_chunk(DatasetParameterID, ChunkRank, ChunkDimensions);e = H5Pset_deflate(DatasetParameterID, 1);}