1.2.3 Avro Serialization

Topic Version1Published10/31/2016
For StandardETP v1.1

The serialization of messages in ETP follows a subset of the Apache Avro specification (http://avro.apache.org/docs/current/spec.html). Avro is a system for defining schemas and serializing data objects according to those schemas. It was developed as a part of the Hadoop® project to provide a flexible, high-speed serialization mechanism for processing big data. The ETP Workgroup selected Avro after a review of several similar serialization systems. Again, ETP uses only a subset of the Avro functionality as described here:

  • ETP does define all messages using the Avro schema file format. The formal definitions of these schemas are defined in UML class models using Enterprise Architect (EA), and the specifications in this document and the Avro schema files are generated from these EA models.
  • ETP does serialize all messages on the wire in accordance with the Avro serialization rules.
  • ETP does not use the Avro RPC facility.
  • ETP does not use the Avro container file facility.
  • ETP does use the additional schema attributes (permissible in Avro) to define message and protocol metadata.

The Avro specification supports the use of both binary and JSON (JavaScript Object Notation) encoding of data. ETP also supports the use of both, with the following caveats:

  • All messages within a given ETP session must use the same encoding (binary or JSON). The encoding that is used is negotiated as described in the discussion below of Protocol 0.
  • Agents are not required to support both encodings. This exception is primarily to allow smaller, resource-constrained implementations to use only one encoding.

Unlike XML, Avro has no concept of a well-formed vs. valid document or a generic document node model; thus, it is not possible to de-serialize an Avro document without knowledge of the schema of that document. For this release of ETP, it is assumed that all parties have prior knowledge of the schemas involved. In future releases, capabilities will be added to exchange version-specific schemas at the time of negotiating the session, which will allow an agent to consume any ETP message, even if it cannot use all of the information in the message.