DRAFT, NOT READY FOR PUBLICATION: Interface Description Languages should support more asymmetric types
Context
About a month ago I read What Functional Programmers Get Wrong About Systems by Ian Duncan.
I thought this was a excellent article. It exposed me to a lot of
academic literature that I was not aware of, tackling what I learned
to be the problem of Dynamic Software Updating
.
In general, the article took extra time to stress
the risk of type compatibility errors at the boundaries of a program,
typically between databases and other services.
Additionally, it pointed out the use of a compatibility checker to verify whether boundary type changes were or were not safe:
If you have all three [subsystems] (version tags, compatibility functions, and runtime version inventory) you can answer the question that actually matters before every deploy: “Is the version I’m about to deploy compatible with every version currently running?” Not “does it compile?” Not “do the tests pass?” But: “given the actual set of deployments that exist right now, is it safe to add this one?”
Nobody has built the unified tool that answers this across all boundary types simultaneously. (If you are reading this and thinking “that sounds like a startup,” please, by all means.) But the components exist. A deploy pipeline that queries your orchestrator for running image tags, checks your migration history against the schema registry, diffs your GraphQL schema against collected client operations, and runs Buf’s compatibility checks: this is buildable today, with off-the-shelf parts. It is engineering work, not research.
I worked at Canva for 3 and a bit years, where I worked on, for all intents and purposes, a hard fork of Protocol Buffers, and these 3 separate subsystems. One addition in that fork was that of the "Safe proto evolution extensions", that, along with "extension aware compatibility functions" almost entirely eliminated API and RPC compatibility issues.
I feel that not enough was said in the cited article about techniques for making boundary type changes safe, rather than just checking their safety. I hope to achieve that in this article.
Preamble
For the purposes of this article I have chosen to use Protocol Buffers, specifically with proto 2 syntax as it is what I am most familiar with. But asymmetric types could be added to any mature IDL (Thrift, Smithy, etc.).
But I will now define some terminology that I think is abstract enough to talk about any IDL, not just Protocol Buffers:†
A schema may define services, which define methods, that have request types and response types:
service SearchService {
rpc Search(SearchRequest) returns (SearchResponse);
}
A schema may define messages, that have fields with types and labels. Labels may indicate optional,
required or repeated:
message SearchRequest {
required string query = 1;
optional int32 page_number = 2;
optional int32 results_per_page = 3;
optional Corpus corpus = 4;
}
I make a distinction between a "type", in an IDL, and a "validation" in generated code. Typically a type generates 1 validation. In the case of asymmetric validations, 2.
A schema may define enums, that have values:
enum Corpus {
CORPUS_UNSPECIFIED = 0;
CORPUS_UNIVERSAL = 1;
CORPUS_WEB = 2;
CORPUS_IMAGES = 3;
CORPUS_LOCAL = 4;
CORPUS_NEWS = 5;
CORPUS_PRODUCTS = 6;
CORPUS_VIDEO = 7;
}
A change to a schema is called an evolution:
message SearchRequest {
required string query = 1;
optional int32 page_number = 2;
}
and creates a new schema version, a specific snapshot of a file:
message SearchRequest {
required string query = 1;
}
message SearchRequest {
required string query = 1;
optional int32 page_number = 2;
}
Schema versions can be generated into classes which build instances which serialize to payloads. Classes are built into artifacts and deployed into old releases.†
Schema versions can have a "liveness" property. A version is live if its classes CAN attempt to deserialize a new payload OR a version's payload CAN be deserialized by new classes. This implies that if a specific schema version's payload is persisted to a data store, it remains indefinitely live.
flowchart TD
A["a schema"]
B0["schema version v0"]
B1["schema version v1"]
B2["schema version v2"]
C["classes"]
D["instances"]
E["payloads"]
F["artifacts"]
G["releases"]
A -- "has" --> B0
A -- "has" --> B1
A -- "has" --> B2
B2 -- "generates" --> C
C -- "deserialize or validate" --> D
D -- "serialize" --> E
C -- "built into" --> F
F -- "deployed as" --> G
E -- "sent to" --> C
A schema can be said to made "narrower" than its previous version when:
- For messages (a product type), we add a unique new field
- For enums (a sum type), we add a unique new value
Conversely, a schema can can be made "wider" than its previous version when:
- for messages, we remove a field
- for enums, we remove a value
A method, is said to be made narrower if its request or response is made narrower.
A method, is said to be made wider if its request or response is made wider.
A service is said to made narrower if any its methods are made narrower.
A service is said to made wider if any its methods are made wider.
Problems with symmetric types
Assume an environment where we do not control deployment ordering, such for a given release.
If an engineer needs to strengthen a message, such as the simple act of a trying to add a required
field they create an incompatibility risk:
message SearchRequest {
}
message SearchRequest {
required string user = 1;
}
message SearchRequest {
}
message SearchRequest {
required string user = 1;
}
This also occurs when weakening a message, by removing a required field.
message SearchRequest {
required string user = 1;
}
message SearchRequest {
required string user = 1;
}
message SearchRequest {
required string user = 1;
}
message SearchRequest {
required string user = 1;
}
And strengthening an enum, by removing a value.
enum PhoneType {
PHONE_TYPE_UNSPECIFIED = 0;
PHONE_TYPE_MOBILE = 1;
PHONE_TYPE_HOME = 2;
PHONE_TYPE_WORK = 3;}
enum PhoneType {
PHONE_TYPE_UNSPECIFIED = 0;
PHONE_TYPE_MOBILE = 1;
PHONE_TYPE_HOME = 2;
PHONE_TYPE_WORK = 3;
}
enum PhoneType {
PHONE_TYPE_UNSPECIFIED = 0;
PHONE_TYPE_MOBILE = 1;
PHONE_TYPE_HOME = 2;
PHONE_TYPE_WORK = 3;}
enum PhoneType {
PHONE_TYPE_UNSPECIFIED = 0;
PHONE_TYPE_MOBILE = 1;
PHONE_TYPE_HOME = 2;
PHONE_TYPE_WORK = 3;
}
And weakening an enum, by adding a value.
enum PhoneType {
PHONE_TYPE_UNSPECIFIED = 0;
PHONE_TYPE_MOBILE = 1;
PHONE_TYPE_HOME = 2;
}
enum PhoneType {
PHONE_TYPE_UNSPECIFIED = 0;
PHONE_TYPE_MOBILE = 1;
PHONE_TYPE_HOME = 2;
PHONE_TYPE_WORK = 3;
}
enum PhoneType {
PHONE_TYPE_UNSPECIFIED = 0;
PHONE_TYPE_MOBILE = 1;
PHONE_TYPE_HOME = 2;
}
enum PhoneType {
PHONE_TYPE_UNSPECIFIED = 0;
PHONE_TYPE_MOBILE = 1;
PHONE_TYPE_HOME = 2;
PHONE_TYPE_WORK = 3;
}
One may assume that controlling the ordering of deploys prevents these incompatibilities.
Not true.
A simultaneous narrowing and widening of a schema "deadlocks" the schema such there is no deployment ordering that prevents incompatibilities.
enum PhoneType {
PHONE_TYPE_UNSPECIFIED = 0;
PHONE_TYPE_MOBILE = 1;
PHONE_TYPE_HOME = 2;
PHONE_TYPE_WORK = 3;
}
enum PhoneType {
PHONE_TYPE_UNSPECIFIED = 0;
PHONE_TYPE_MOBILE = 1;
PHONE_TYPE_HOME = 2;
PHONE_TYPE_WORK = 3;
PHONE_TYPE_FAX = 4;
}
enum PhoneType {
PHONE_TYPE_UNSPECIFIED = 0;
PHONE_TYPE_MOBILE = 1;
PHONE_TYPE_HOME = 2;
PHONE_TYPE_WORK = 3;
}
enum PhoneType {
PHONE_TYPE_UNSPECIFIED = 0;
PHONE_TYPE_MOBILE = 1;
PHONE_TYPE_HOME = 2;
PHONE_TYPE_WORK = 3;
PHONE_TYPE_FAX = 4;
}
Furthermore, if the evolution takes place in a domain object that is referenced transitively by 2 separate messages that are in a request position and a response position also deadlocks.
message User {}
message CreateUserRequest {
required User user = 1;
}
message GetUserResponse {
required User user = 1;
}
message User {
required string email = 1;
}
message CreateUserRequest {
required User user = 1;
}
message GetUserResponse {
required User user = 1;
}
message User {}
message CreateUserRequest {
required User user = 1;
}
message GetUserResponse {
required User user = 1;
}
message User {
required string email = 1;
}
message CreateUserRequest {
required User user = 1;
}
message GetUserResponse {
required User user = 1;
}
Even ordered deployment is insufficient to avoid this scenario.
What is an asymmetric type?
Asymmetric types have different validations on their classes constructor than its deserializer.
These validations execute at typecheck-time or runtime in the constructor, or just runtime in the deserializer.
A rule of thumb is that the constructor validation must always entail the deserialization validation.
A compatibility checker can check, statically, whether a schema evolution will cause incompatibilities if the evolution is merged and then deployed immediately or at the next scheduled release.
More formally, CV ⇛ DV which expands to ∀x. CV(x) ⟹ DV(x)
There is a rich design space of asymmetric types. Let's look at a few.
Defaults
Explicit field defaults
The most common form of asymmetric type is a field default.
It relaxes the just deserializer such that it accepts missing payload data.
But the default must be a member of type field type T.
message SearchRequest {
}
message SearchRequest {
optional string user = 1
[default = "anonymous"];
}
message SearchRequest {
required string user = 1
[default = "anonymous"];
}
message SearchRequest {
}
message SearchRequest {
optional string user = 1
[default = "anonymous"];
}
message SearchRequest {
required string user = 1
[default = "anonymous"];
}
This poses 2 problems:
- Some types don't have a reasonable default value, particularly non-primitive messages.
- We may typically choose a
0default for primitives, but this deprives the domain layer of a useful distinction: whether the0was a fallback or a legitimate value sent over the wire.
Implicit type defaults
Protocol Buffers is particularly egregious when it comes to utilizing this defaulting behavior. Along with syntax for explicitly registering a default, it implicitly enrolls most primitive types into this defaulting behavior, and even enums, defaulting to the member with the lowest value.
enum Corpus {
CORPUS_WEB = 1;
CORPUS_IMAGES = 3;
CORPUS_NEWS = 5;
}
enum Corpus {
CORPUS_UNSPECIFIED = 0;
CORPUS_WEB = 1;
CORPUS_IMAGES = 3;
CORPUS_NEWS = 5;
}
enum Corpus {
CORPUS_UNSPECIFIED = 0;
CORPUS_WEB = 1;
CORPUS_IMAGES = 3;
CORPUS_NEWS = 5;
CORPUS_VIDEO = 7;
}
enum Corpus {
CORPUS_WEB = 1;
CORPUS_IMAGES = 3;
CORPUS_NEWS = 5;
}
enum Corpus {
CORPUS_UNSPECIFIED = 0;
CORPUS_WEB = 1;
CORPUS_IMAGES = 3;
CORPUS_NEWS = 5;
}
enum Corpus {
CORPUS_UNSPECIFIED = 0;
CORPUS_WEB = 1;
CORPUS_IMAGES = 3;
CORPUS_NEWS = 5;
CORPUS_VIDEO = 7;
}
Thus, proto enums, are defined by programming
convention to have an UNKNOWN variant, that either pollutes the domain model, or necessitate a
validation post-deserialization to strip out the UNKNOWN variant and map it onto a
UNKNOWN-less domain model enum.
I feel this rather common pattern is an unfortunate rejection of a "parse, don't validate" philosophy.
Tristates
TODO: a differentiation between absence and non-recognition
Field fallbacks
typical has a novel evolution mechanism: it strengthens the constructor validation such it demands 2 values, but the deserialization validation only requires 1, the fallback.
For choice types, Typical treats this dually: writers of optional/asymmetric cases must provide a fallback,
and readers can always recover to a known case while rollouts are in flight.
Good (adding and removing an optional choice case)
choice SendEmailResponse {
success = 0
error: String = 1
}
choice SendEmailResponse {
success = 0
error: String = 1
optional authentication_error: String = 2
}
choice SendEmailResponse {
success = 0
error: String = 1
}
choice SendEmailResponse {
success = 0
error: String = 1
}
choice SendEmailResponse {
success = 0
error: String = 1
optional authentication_error: String = 2
}
choice SendEmailResponse {
success = 0
error: String = 1
}
Good (adding an asymmetric choice case, then promoting to required)
choice SendEmailResponse {
success = 0
error: String = 1
}
choice SendEmailResponse {
success = 0
error: String = 1
asymmetric please_try_again = 3
}
choice SendEmailResponse {
success = 0
error: String = 1
please_try_again = 3
}
choice SendEmailResponse {
success = 0
error: String = 1
}
choice SendEmailResponse {
success = 0
error: String = 1
asymmetric please_try_again = 3
}
choice SendEmailResponse {
success = 0
error: String = 1
please_try_again = 3
}
constructor required deserialization optional
The simplest means of avoiding the use of a default is to use an optional label when adding fields.
But unfortunately, optional alone offers no such safe evolution pathway to transform a field into a
required field. optional can simply not be transitioned into required
without incompatibility risk, and vice versa.
Use of required was perceived to so fraught with danger at Google that the following tortured
convention was littered throughout the google3 monorepo:
message SearchRequest {
// required
optional string user = 1;
}
and the required label was removed entirely in proto3 syntax.
I strongly believe the removal or this label was unnecessary, and an asymmetric label should have been added instead.
Thus, the second, less common type of asymmetric type we'll look at is "constructor required deserialization
optional" label. To save ink, I will instead call it asymmetric the same as field label from the IDL
typical, and the namesake of this article.
asymmetric label forces parameterization of a non-null value at the constructor site but permits
data absence in the payload.
Use of this label admits safe evolution semantics! We can now use it as a safe intermediate stage when adding a required field:
message SearchRequest {
}
message SearchRequest {
asymmetric string user = 1;
}
message SearchRequest {
required string user = 1;
}
message SearchRequest {
}
message SearchRequest {
asymmetric string user = 1;
}
message SearchRequest {
required string user = 1;
}
and it follows that it may also permit:
- transitioning from
optionaltorequired - transitioning from
requiredtooptional - removing a
requiredfield
Unproducible
The third type is unproducible.
For enums (or any sum type), we prevent the use of the value or variant in certain contexts, but permit its deserialization.
The label permits us to safely add a variant under the assumption that no other engineer can construct, serialize and send it to deserializers that are unprepared to handle it.
enum PhoneType {
PHONE_TYPE_UNSPECIFIED = 0;
PHONE_TYPE_MOBILE = 1;
PHONE_TYPE_HOME = 2;
PHONE_TYPE_WORK = 3;
}
enum PhoneType {
PHONE_TYPE_UNSPECIFIED = 0;
PHONE_TYPE_MOBILE = 1;
PHONE_TYPE_HOME = 2;
PHONE_TYPE_WORK = 3;
unproducible PHONE_TYPE_FAX = 4;
}
enum PhoneType {
PHONE_TYPE_UNSPECIFIED = 0;
PHONE_TYPE_MOBILE = 1;
PHONE_TYPE_HOME = 2;
PHONE_TYPE_WORK = 3;
PHONE_TYPE_FAX = 4;
}
enum PhoneType {
PHONE_TYPE_UNSPECIFIED = 0;
PHONE_TYPE_MOBILE = 1;
PHONE_TYPE_HOME = 2;
PHONE_TYPE_WORK = 3;
}
enum PhoneType {
PHONE_TYPE_UNSPECIFIED = 0;
PHONE_TYPE_MOBILE = 1;
PHONE_TYPE_HOME = 2;
PHONE_TYPE_WORK = 3;
unproducible PHONE_TYPE_FAX = 4;
}
enum PhoneType {
PHONE_TYPE_UNSPECIFIED = 0;
PHONE_TYPE_MOBILE = 1;
PHONE_TYPE_HOME = 2;
PHONE_TYPE_WORK = 3;
PHONE_TYPE_FAX = 4;
}
One of the reasons I suspect that it quite difficult, although possible, to statically enforce unproducibility.
One such way is to enforce unproducibility is the use of a linter that ensures that values and constructors may only be used in 'if' tests, and switch case labels and patterns.
Two validations
The fourth type is "two validations".
Some IDLs support arbitrary field validations, such as Amazon's Smithy and Protocol Buffers with Buf's protovalidate.
But these validations inherit the restrictions placed upon required and optional
fields: they cannot change safely in RPC and API contexts.
Narrowing
message SearchRequest {
required int32 f = 1 [
(cel_validation) = "this <= 5"
]
}
message SearchRequest {
required int32 f = 1 [
(cel_validation) = "this <= 4"
]
}
message SearchRequest {
required int32 f = 1 [
(cel_validation) = "this <= 5"
]
}
message SearchRequest {
required int32 f = 1 [
(cel_validation) = "this <= 4"
]
}
Widening
message SearchRequest {
required int32 f = 1 [
(cel_validation) = "this <= 5"
]
}
message SearchRequest {
required int32 f = 1 [
(cel_validation) = "this <= 6"
]
}
message SearchRequest {
required int32 f = 1 [
(cel_validation) = "this <= 5"
]
}
message SearchRequest {
required int32 f = 1 [
(cel_validation) = "this <= 6"
]
}
Once again, decoupling the constructor validation and the deserialisation validation admits the ability to change the validation.
Widening
message SearchRequest {
required int32 f = 1 [
(constructor_cel) = "this <= 5",
(deserialize_cel) = "this <= 5"
]
}
message SearchRequest {
required int32 f = 1 [
(constructor_cel) = "this <= 5",
(deserialize_cel) = "this <= 6"
]
}
message SearchRequest {
required int32 f = 1 [
(constructor_cel) = "this <= 6",
(deserialize_cel) = "this <= 6"
]
}
message SearchRequest {
required int32 f = 1 [
(constructor_cel) = "this <= 5",
(deserialize_cel) = "this <= 5"
]
}
message SearchRequest {
required int32 f = 1 [
(constructor_cel) = "this <= 5",
(deserialize_cel) = "this <= 6"
]
}
message SearchRequest {
required int32 f = 1 [
(constructor_cel) = "this <= 6",
(deserialize_cel) = "this <= 6"
]
}
Narrowing
message SearchRequest {
required int32 f = 1 [
(constructor_cel) = "this <= 5",
(deserialize_cel) = "this <= 5"
]
}
message SearchRequest {
required int32 f = 1 [
(constructor_cel) = "this <= 4",
(deserialize_cel) = "this <= 5"
]
}
message SearchRequest {
required int32 f = 1 [
(constructor_cel) = "this <= 4",
(deserialize_cel) = "this <= 4"
]
}
message SearchRequest {
required int32 f = 1 [
(constructor_cel) = "this <= 5",
(deserialize_cel) = "this <= 5"
]
}
message SearchRequest {
required int32 f = 1 [
(constructor_cel) = "this <= 4",
(deserialize_cel) = "this <= 5"
]
}
message SearchRequest {
required int32 f = 1 [
(constructor_cel) = "this <= 4",
(deserialize_cel) = "this <= 4"
]
}
Thus, a message field's constructor validation must entail the deserialisation validation.
Checking whether the invariant holds, and whether a proposed evolution is legal, can prove difficult.
CEL ensures that evolutions of field validations are compatible with their previous schema versions using a lattice based type-system.
Although somewhat overkill, I believe that Z3 would be sufficient to check the invariant, and to check that the constructor validation entails the deserialization validations of all previous schema versions.
AI usage disclosure
All prose was handwritten, with predictive autocomplete disabled. Diagrams were generated. Proof-read by Claude Opus 4.6 on TODO.