3 Schemas

Topic Version1Published10/31/2016
For StandardETP v1.1

This section is normative.

This section contains the formal UML and Avro schema definitions of the messages that are exchanged as part of the Energistics Transfer Protocol (ETP). These are defined using the following UML stereotypes and conventions:

  • Enumeration: Enumerated values are defined in the schemas as a list of literal names, and serialized on the wire as an integer value. Avro schemas do not allow a bespoke integer to be associated with a given enumeration, and so they are order dependent. This implies a couple of things for design. First, the UML tool used for modeling must be capable of preserving this ordering, and, second, schema authors must be careful to keep the ordering consistent between versions, to provide maximum interoperability.
  • Record: An Avro record is more or less the same as a C or C++ struct. The record stereotype is used to designate low-level data types that are composed to create messages. For example, the DateTime record is used to define how a date is transferred in all messages.
  • Message: Represents a top-level message that can be sent between client and server. Messages are identical to records in all ways, except that they are designated as being transferable as a top-level element in ETP.
  • Union: Used to represent a type that can be any one of a selected list of types. Each type is reflected in the UML as an attribute of the union class itself. Union more or less maps to the xsd:choice element in XML schemas.
  • Map: UML does not support maps very well natively. However, in ETP, we can simplify because all Avro maps have string keys. So in UML, a map type is simply defined as a collection of type X, where X is the value types of the map, and the keys are assumed to be strings. These concepts are reflected in the Avro schema generation rules.

The Avro schemas, in JSON form, are produced automatically by the code-generation process in Enterprise Architect (EA). This built-in code-generation process creates one .avsc file per class, in a folder structure that matches the package hierarchy. There is a second script that can be used to generate all of the schemas in a single Avro Protocol (.avpr) file. Note that while the .avpr format is a convenient way to place all of the schemas in a single file, ETP DOES NOT use the Avro RPC protocol.

The primitives used for attribute types in the UML class definitions are exactly those used in Avro. The set of primitive type names is:

  • null: no value
  • Boolean: a binary value
  • int: 32-bit signed integer
  • long: 64-bit signed integer
  • float: single-precision (32-bit) IEEE 754 floating-point number
  • double: double-precision (64-bit) IEEE 754 floating-point number
  • bytes: sequence of 8-bit unsigned bytes
  • string: unicode character sequence

Conversion from the Avro schema to language-specific proxy classes is described later in this document.