Skip to content

[Xlang][Spec] Fory XLang Union Encoding Spec #3191

@chaokunyang

Description

@chaokunyang

This proposal specifies the IDL union encoding in the Fory xlang serialization format using three union type IDs:

  • UNION (31): union value without embedded union schema identity (schema known from context)
  • TYPED_UNION (32): union value with embedded registered numeric type id
  • NAMED_UNION (33): union value with embedded registered type name / shared typedef

This design moves union schema identity into Type Meta, making union consistent with STRUCT/ENUM/EXT patterns and
natural to carry inside Any.


1. IDL Syntax

1.1 Union definition

union Contact [id=0] {
  string email = 1;
  int32  phone = 2;
}

Rules:

  • Each union alternative MUST have a stable tag number (= 1, = 2, ...).
  • Tag numbers MUST be unique within the union.
  • Tag numbers SHOULD follow protobuf evolution rules: do not reuse removed tag numbers.

1.2 Union usage

message Person [id=1] {
  Contact contact = 1;
}

2. Mapping from Other IDLs

2.1 Protobuf oneof → Fory union

Protobuf:

message Person {
  oneof contact {
    string email = 1;
    int32  phone = 2;
  }
}

Mapping:

  • The oneof group maps to a Fory union.
  • Each oneof field number becomes the union alternative tag number (case_id).

2.2 FlatBuffers union → Fory union

FlatBuffers:

union Equipment { Weapon, Monster }

Mapping example:

union Equipment {
  Weapon  weapon  = 0;
  Monster monster = 1;
}
  • FlatBuffers discriminator values map to union alternative tag numbers (case_id).

3. Type IDs

3.1 Internal Type ID Table Updates

Type ID Name Description
31 UNION Union value, schema identity is NOT embedded (context required).
32 TYPED_UNION Union value with embedded registered numeric union type id.
33 NAMED_UNION Union value with embedded union type name / shared typedef.

3.2 Type Meta Encoding

All type IDs are written as varuint32 as per the xlang Type Meta rules.

  • UNION (31): no additional type meta payload
  • TYPED_UNION (32): followed by union_type_id (varuint32)
  • NAMED_UNION (33): followed by named-type meta payload:
    • meta share disabled: namespace + type_name (meta strings)
    • meta share enabled: shared TypeDef marker + TypeDef body (per xlang meta share)

Notes:

  • union_type_id uses the standard Full Type ID rule:
    • Full Type ID = (user_type_id << 8) | internal_type_id
  • How union schemas are registered/mapped is implementation-defined, but the numeric union_type_id MUST be stable
    between producer and consumer when TYPED_UNION is used.

4. Union Value Payload Encoding

A union value payload is encoded as:

| case_id (varuint32) | case_value (encoded as Any-style value) |

4.1 case_id

  • case_id is the union alternative tag number from FDL/protobuf/FlatBuffers mapping.
  • It is encoded as varuint32.
  • case_id MUST be stable for evolution and MUST NOT be reused for a different alternative.

4.2 case_value (MUST be encoded as Any-style value)

To guarantee that unknown alternatives can be skipped, case_value MUST be encoded as a full Fory value,
equivalent to encoding the value as if it were stored in Any/UNKNOWN:

| field_ref_meta | field_value_type_meta | field_value_bytes |

Where:

  • field_ref_meta is the standard reference meta (NULL/REF/NOT_NULL/REF_VALUE).
  • field_value_type_meta is the standard xlang Type Meta (a varuint32 type_id plus optional meta payload).
  • field_value_bytes is the value bytes encoded according to field_value_type_meta.

This is required even for primitives (e.g., INT32, STRING) to ensure skipping is always possible.


5. Full Wire Layout Examples

5.1 UNION (schema known from context)

Used when the deserializer already knows the union schema (e.g., the field is declared as a specific union type):

| ... outer ref meta ... | type_id=UNION(31) | case_id | case_value(any-style) |

5.2 TYPED_UNION (schema embedded by numeric id)

Used when union schema is not known from context, e.g., union is stored in Any:

| ... outer ref meta ... | type_id=TYPED_UNION(32) | union_type_id | case_id | case_value(any-style) |

5.3 NAMED_UNION (schema embedded by name/typedef)

Used when union schema is resolved by name or via meta share TypeDef:

| ... outer ref meta ... | type_id=NAMED_UNION(33) | (namespace,type_name OR typedef marker) | case_id | case_value(any-style) |

6. Decoding Rules

6.1 High-level decoding algorithm

  1. Read outer ref meta (per standard rules).
  2. Read type_id as varuint32.
  3. If type_id == UNION(31):
    • Union schema MUST be provided by context (declared field type / target type).
  4. If type_id == TYPED_UNION(32):
    • Read union_type_id (varuint32) and resolve the union schema from the registry.
  5. If type_id == NAMED_UNION(33):
    • Read named-type meta (name strings or TypeDef marker) and resolve the union schema.
  6. Read case_id (varuint32).
  7. Read case_value as Any-style value:
    • Read field_ref_meta
    • If non-null and not a reference:
      • Read field_value_type_meta
      • Read/construct the value using that type meta

6.2 Unknown case_id handling (forward compatibility)

If the resolved union schema does not contain case_id, the decoder MUST still consume the case value:

  1. Read field_ref_meta.
  2. If non-null and not a ref:
    • Read field_value_type_meta
    • Call standard skipValue(field_value_type_meta.type_id) to skip field_value_bytes.

This guarantees that adding new union alternatives is forward compatible.


7. When to Use UNION vs TYPED_UNION vs NAMED_UNION

  • Use UNION (31) when the union schema is known from context:
    • struct fields declared as a union type
    • collections/maps with declared union element/value type
    • explicit deserialize<TUnion>()
  • Use TYPED_UNION (32) when the union schema is not known from context and numeric registration is available:
    • union stored in Any
    • union stored in fully dynamic UNKNOWN fields
  • Use NAMED_UNION (33) when numeric registration is not available or name-based resolution is preferred:
    • unregistered union schemas
    • cross-language name mapping
    • meta share environments using shared TypeDef

Implementations MAY choose to always write TYPED_UNION/NAMED_UNION for simplicity, but UNION is recommended where
context exists for smaller payloads.


8. Compatibility and Evolution Notes

  • case_id MUST be treated as a stable identifier (like protobuf field number).
  • Adding a new alternative is forward compatible:
    • old readers skip unknown case_id because case values carry standard type meta.
  • Removing an alternative is backward compatible if:
    • removed case_id is not reused
    • readers treat unknown alternatives as “present but ignored”.

9. Summary

  • Introduce three internal union type IDs: UNION(31), TYPED_UNION(32), NAMED_UNION(33).
  • Union schema identity is carried in Type Meta for typed/named unions, consistent with other user-defined types.
  • Union payload is always:
    • case_id(varuint32) + case_value encoded as Any-style value (ref meta + type meta + value).
  • Unknown union alternatives can always be skipped safely.

Related Issues

#3027

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions