The Cedarbridge language is defined without reference to any particular set of rules
for actually encoding values of the various user-defined types for transmission. This specification
section defines a canonical format for binary encoding,
CCBE, that all implementations are expected to be able to understand.
The CCBE is defined as following:
A byte is an eight-bit quantity with the individual bits written to the
transmission medium in
decreasing order of significance. That is, the most signficant bit
is written to the output first, and the least signficant bit is
written last. When the individual bits of an N-bit value
are enumerated, it is to be understood that the most signficant bit is numbered
0 and the least signficant bit is numbered
N-1. A byte can therefore be described as the following
product type:
For the purposes of specification, we assume the existence of a function
bit(n, x) : ℕ → α → 𝔹
that, given an arbitrary input value x of type α,
returns the value of bit n of the input value according to the above ordering and
bit numbering rules. This is merely a notational aid and should be understood to be something that implementations
are actually required to implement.
A
byte sequence is, unsuprisingly, a sequence of
bytes. The notation
[]
denotes a sequence of length
0, and the operator
⊕
prepends a byte to a
byte sequence. The operator
⊕ is
right-associative, so
∀x y z. x ⊕ y ⊕ z = x ⊕ (y ⊕ z).
The kind CCBE denotes the disjoint set of types present in the
encoding rules. The definition is as follows:
For each of the types in CCBE, we describe a function
encode(x) : (τ : CCBE) → (α : τ) → ByteSequence (where
τ is an implicit parameter that can be inferred
from α). The purpose of the encode
function is to describe how to transform a value of a given type to a byte sequence that can
be written to the destination transmission medium.
The S8 type is an opaque type that can hold a value in the range
[-128, 127].
Values of type S8 are encoded as follows:
The S16 type is an opaque type that can hold a value in the range
[-32768, 32767].
Values of type S16 are encoded as follows:
The S32 type is an opaque type that can hold a value in the range
[-2147483648, 2147483647].
Values of type S32 are encoded as follows:
The S64 type is an opaque type that can hold a value in the range
[-9223372036854775808, 9223372036854775807].
Values of type S64 are encoded as follows:
The U8 type is an opaque type that can hold a value in the range
[0, 255].
Values of type U8 are encoded as follows:
The U16 type is an opaque type that can hold a value in the range
[0, 65535].
Values of type U16 are encoded as follows:
The U32 type is an opaque type that can hold a value in the range
[0, 4294967295].
Values of type U32 are encoded as follows:
The U64 type is an opaque type that can hold a value in the range
[0, 18446744073709551615].
Values of type U64 are encoded as follows:
The F16 type is an opaque type that can hold a value in the value
set of the IEEE 754 binary16
type.
Values of type F16 are encoded as follows:
The F32 type is an opaque type that can hold a value in the value
set of the IEEE 754 binary32
type.
Values of type F32 are encoded as follows:
The F64 type is an opaque type that can hold a value in the value
set of the IEEE 754 binary64
type.
Values of type F64 are encoded as follows:
The ByteArray type is an opaque sequence of bytes. A value
b of type ByteArray has a fixed length
n denoted length(b) : ℕ = n. The property
∀x. 0 ≤ length(x) ≤ 4294967295 always holds.
The byte at position m (where 0 ≤ m < n)
is denoted byte(m, b) : Byte.
Values of type
ByteArray are encoded as follows: For a value
x, a byte sequence
k ⊕ p ⊕ [] is constructed,
where
k is the result of encoding
length(x)
as if it were a value of type
U32, and
p is simply
byte(m, x) for all
m | 0 ≤ m < length(x).
The List type is parameterized type describing a sequence of values. A value
x of type List α has a fixed length
n denoted length(x) : ℕ = n. The property
∀x. 0 ≤ length(x) ≤ 4294967295 always holds.
The value at position m (where 0 ≤ m < n)
is denoted value(m, x) : α.
Values of type
List are encoded as follows: For a value
x, a byte sequence
k ⊕ p ⊕ [] is constructed,
where
k is the result of encoding
length(x)
as if it were a value of type
U32, and
p is the byte sequence obtained by concatenating the
results of encoding
value(m, x) for all
m | 0 ≤ m < length(x).
The
Cedarbridge
String
type is encoded as if it were a
ByteArray after
having first converted the sequence of characters to a sequence of
UTF-8 encoded bytes.
A value of a
record type
r is encoded by concatenating the byte sequences produced by encoding
the
fields of
r in the order that the fields
were declared in the type of
r.
A value of a
variant type
r is encoded by first determining the
variant index
v of
r. The
variant
index is the number of the
case that was used
to construct
r, assuming that cases are numbered in declaration order starting
at
0. For the case
c that was used to
construct
r, the byte sequence produced for
r
is
k ⊕ p ⊕ [], where
k is the result of encoding
v as if it were a value of type
U32, and
p
is the concatenation of the byte sequences produced by encoding the fields of
r in the order that the fields
were declared in the declaration of
c.
Conceptually, each
type present in
a protocol version is added to one effectively anonymous variant type per version. The cases of
the variant are ordered lexicographically by the names of the types. This anonymous variant type
is then encoded exactly as regular
variant types
are normally encoded.
For example, assuming the following definitions:
When a message M is encoded in versioned form, we use
the notation V(M) to refer to the versioned nature of
the message.
In protocol version 1, the expression
V(A 23) : A is encoded as the byte sequence
0x00 0x00 0x00 0x00 0x17. That is, the first
four bytes specify the type at case 0 of the
anonymous variant implicitly defined by the [version 1 ...]
declaration, followed by the encoded form of (A 23) : A.
In protocol version 2, the expression
V(C1 23) : C is encoded as the byte sequence
0x00 0x00 0x00 0x02 0x00 0x00 0x00 0x01 0x17. That is, the first
four bytes specify the type at case 2 of the
anonymous variant implicitly defined by the [version 2 ...]
declaration, followed by the encoded form of (C1 23) : C.
In protocol version 3, the expression
P (B 23) : B is encoded as the byte sequence
0x00 0x00 0x00 0x00 0x17. That is, the first
four bytes specify the type at case 0 of the
anonymous variant implicitly defined by the [version 3 ...]
declaration. Because the type A was removed in version 3,
B is now the type at case 0,
whereas it was previously the type at case 1 in versions 1
and 2.
The expression (Some 23) : [Option IntegerUnsigned32] is encoded
as the byte sequence 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x17. This encoding
comes from the fact that Option is a variant type,
Some is case number 1, and the value
of the one and only field of Some is 23 (0x17).
The expression None : [Option IntegerUnsigned32] is encoded
as the byte sequence 0x00 0x00 0x00 0x00. This encoding
comes from the fact that Option is a variant type,
None is case number 0, and there
are no fields in the case.
Assuming the following type:
The expression [Vector3f 17.0 199.0 1.00781238] is encoded as the byte
sequence 0x41 0x88 0x00 0x00 0x43 0x47 0x00 0x00 0x3f 0x80 0xff 0xff. This
is simply three IEEE 754 binary32 values.
The string "hello" is encoded as the byte sequence
0x00 0x00 0x00 0x05 0x68 0x65 0x6c 0x6c 0x6f. This is the number of UTF-8
encoded bytes (5) followed by the UTF-8 encoded bytes of the string.
The expression [List 17038 27297 17288] : List IntegerSigned16 is encoded
as the byte sequence 0x00 0x00 0x00 0x03 0x42 0x8e 0x6a 0xa1 0x43 0x88. This
is the length of the list (3) followed by the encoded forms of
17038 (0x428e), 27297 (0x6aa1),
and 17288 (0x4388), respectively.