Introduction

When personal computers were scattered machines, data ferried by hand from one to another on floppy disks or tapes, finding a piece of data meant remembering which physical media it was stored on.

Interconnection between these computers was slow, and grew slower with distance. These constraints placed control over applications and data in the operator's hands. Application providers made various attempts to control applications, mostly futile, but relatively few attempts to control or collect user data. Power of computing and data rested with the operators.

The high speed connections of the 2010s changed those constraints, and the ecosystem of software changed accordingly. An enormous amount of software now runs on computers in a datacenter somewhere, leaving us to interact with but a thin skin. The power, the control and utilization of data and applications, has shifted, for the most part, to application providers. In the connected world, finding a piece of data means remembering which device or cloud service which version is stored on.

People gave up that power for many reasons, but the most significant one is the ease of collaboration. It is orders of magnitude easier to work on a centralized document than to send different versions of that document around and keep track of changes. Distributed version control systems provide a theoretical solution, but in practice their use is mostly restricted to software development, and even there providers have leveraged significant centralization.

Where files are a basic interchange unit for individual local applications, nodes are a basic interchange unit for collaborative local applications.

  • References are one of the biggest differences between files and nodes.

    • Files are referenced by user-assigned names. Any global references must locate a file on a host and path within that host. These names are subject to "link rot" as the organization scheme changes, and make providing access to the same file at multiple locations difficult.

    • Nodes are referenced by globally unique unforegeable identifiers, which are the same no matter which host the node is stored on, or what organizational scheme is used. They may then additionally be given local names for user convenience.

  • Files and nodes can both reference other files and nodes.

    • References to other files must be in the data of the file. Extracting those references require understanding each file format, and that the data be decrypted.

    • Nodes store references to other nodes outside their data. This allows the data to be encrypted or garbage collected without preventing synchronization.

  • Version handling and conflict resolution is completely different.

    • Files only have one identifier, their name. There are myriad systems to track changes in files over time, for backup, concurrent modification, etc., all working around this central flaw. On the flip side, if history retention is not desired, and there's no need for collaboration, files are an excellent fit.

    • Versioned nodes have multiple identifiers, one for any version of that node (in practice this usually means the latest), and then one for each version. While this makes concurrent modification very conveniently detectable and resolvable, saving ephemeral data to nodes may require special handling.

Updates on versioned nodes can be passed from stranger to stranger or friend to friend, across the Internet or on external media. Subscriptions, asking for these updates, tie in nicely with existing relationships, between friends and family, colleagues, and collaborators. While strangers on the Internet can share with a more swarm-like protocol.

This handbook is a guide for working with versioned nodes, from their core data model and cryptography to containers and formats built with them, to protocols for building data networks for them.

Motivations and Goals

There were numerous motivations that formed the design of versioned nodes and the data network based on subscriptions, and especially the core data model. I wanted a data synchronization tool, and there were a bunch of different things I wanted to be able to do with it.

Small Core

The core requirements to run a node should be small enough to fit in a moderately capable embedded device. It could be used to log sensor data or distribute configuration (or anything short of real-time control).

General Purpose

A lot of existing tools are dedicated to being a particular application - the most general being a file system or database. This should be very general purpose, such that distributed file systems and databases can be created with it, as well as other more specialized applications.

Capability Access Control with Delegation

If I can read an object, I should be able to give you access to that object without creating a duplicate object or sharing my credentials.

End-to-End Encryption

The data stored or put into a data network should be encrypted. Intermediate servers shouldn't be able to read it, storage servers shouldn't be able to read it. It should only be readable by the endpoints that have the appropriate capabilities.

Concurrent Modification and Disconnected Operation

Two disconnected applications should be able to modify the same object at the same time, and be able to reconcile any differences in their changes later.

Public Versioning

The encrypted form of mutable objects should have sufficient information to determine the latest versions of that object.

Public Dependencies

The encrypted form of the objects should have sufficient information to optimistically deliver objects an object depends on, and remove unused objects from storage.

Eventual Consistency and Determinism

Two endpoints with the same information and making the same update, by default create identical updates, with identical ciphertext. This is important for avoiding synchronization loops.

If the two identical updates result in different ciphertext, they then may both create a new pair of updates when those updated versions propagate, which again results in new pair of different ciphertexts…

Transport Agnostism

There shouldn't be any special transport requirements for synchronization. Links may be formed over any sort of serial port, packet switched network, or storage media.

Indistinguishable from Random Transport-Adapter

The bits sent across a link should be indistinguishable from random, without knowing some shared key.

Forward and Post-Compromise Secure Transport-Adapter

If someone records the bits sent across a link, and then later compromises the state of one or both parties, they should not be able to recover past message and should shortly become unable to recover messages in the future.

Topology Agnostic

There shouldn't be any particular topology required: highly connected DHT/BitTorrent swarm style, prearranged trusted connections only, as a hierarchical content distribution network, or extremely sparsely connected opportunistic networking.

Access Authorized

It should be possible to restrict fetching objects from a node to authorized users, even with them being encrypted. In particular, it should be possible to limit which objects can be fetched over swarm mode.

Liveness

Updates to objects should be propagated quickly though the network to interested parties.

Retention

Operators should be able to mark objects (and trees of objects) to be kept longer term. That is to say, pinning, backups, et cetera.

Nodes

Abstractly, a node is a structure that holds some data and references to other nodes. From a graph theory perspective, a node represents a vertex in a directed graph and a set of outgoing edges. Any properties of that vertex or labels on those edges (including quantity in a multigraph) are contained in the data attached to the node.

A reference to a node forms the capability to fetch it from a remote location. The references held by a node may be read by anyone with the node. Thus, if you can fetch a node you can (in theory) transitively fetch the whole graph reachable from that node. These references typically take the form of a hash or cryptographic signature, which allows the integrity of the node to be verified.

Reading the data on a node requires the read capability for that node. This typically takes the form of an encryption key, so even if you hold a node, its data is inaccessible without the key.

Some nodes come in versions. Creating a new version of a node requires a write capability for that node. This typically takes the form of a signing key, and is shared between all potential writers of a node.

Some data may be additionally restricted from reading by non-writers of a node, in particular the write-capabilities of referenced nodes.

Nodes may hold up to 1 megabyte of data, plus cryptographic overhead. They may list up to 256 references.

Blobs

Blobs are minimal immutable nodes, containing nothing more than some encrypted data and a set of references.

#![allow(unused)]
fn main() {
struct Blob {
  data: Ciphertext,
  references: ReferenceSet,
};
}

Serialization

Blobs are serialized using atlv, as an array of those two fields, using the tag 0. The references must be listed in lexicographic order. As an example, a 128 byte ciphertext with a single reference to another blob:

offsetbytesdescription
0080the tag (0) of the blob
0142the array header (2 elements)
02c1 00the binary header for the ciphertext (128 bytes)
03…82...the ciphertext
8341the array header (1 element)
8480the tag (0) of a Blob Hash Reference
8581the tag (1) of a Blake3 hash
8620the binary header (32 bytes)
87…a6...the content of the blake3 hash

References

There is only one type of reference to blobs, "Blob Hash References". They are produced by initializing a stateful hash object with the domain "Versioned Node Subscriptions: Reference: Blob: Hash", feeding in the ciphertext, demarcating, feeding the serialized list of references, and finalizing.

Blob Hash References are serialized with the tag 0 followed by the serialization of the contained hash.

Summaries

Blob Summaries are records of blobs that elide the contained data.

#![allow(unused)]
fn main() {
struct BlobSummary {
  state: HashState,
  references: ReferenceSet,
}
}

The hash state for a Blob Summary is produced by initializing a SHO with the domain "Versioned Node Subscriptions: Reference: Blob: Hash", feeding in the ciphertext, and then extracting the state.

The reference for a Blob may be generated from its Summary, by injecting the preserved hash state, feeding the serialized list of references, and finalizing.

Cryptography

Encryption

The ciphertext held within a Blob is produced with using deterministic authenticated encryption with associated data, with a domain of "Versioned Node Subscriptions: Blob Encryption". The associated data is the serialized set of references, including the length. (In the above example, this is bytes 0x83 to the end.) The encryption key is derived using the DAEAD's FromPlaintext method, with an application-provided convergence domain, which may be empty.

Analysis

Blobs provide read-only content addressed encrypted storage. As such they vulnerable to chosen-plaintext attacks, including the confirmation of file and learn the remaining information attacks described here by Tahoe-LAFS.

There are two mechanisms to mitigate this. One is that the key and nonce derivation both take the transitive content of the Blob into account, so two blobs with the same data, but different references, will have different ciphertexts and keys. The other is the same as Tahoe-LAFS uses; the application can provide a convergence domains, and attacks only applies within that convergence domain.

Braids

A braid is the set of versions and possibly a final, which share the same public key. It provides the illusion of mutable node, by tracking the most recent version. Of course, there may be more than one most recent version, and it is then up to the application to decide how to handle that.

Access control is handled with access to cryptographic keys. Anybody with the:

  • public key can fetch a braid (assuming they have access to a store that contains it).

  • shared key can read a braid's contents (assuming they can fetch it).

  • secret key and shared key can write to a braid.

  • rotation signing key and rotation seed can finalize a braid, possibly marking a successor.

Versions

Versions are immutable nodes with history, which underlie virtual mutable nodes, braids. In addition to the encrypted data and a set of references, they hold a list of up to 16 parent versions.

#![allow(unused)]
fn main() {
struct Version {
  data: Ciphertext,
  referencess: ReferenceSet,
  parents: CappedVec<16,VersionSignature>,
}
}

Serialization

Versions are serialized using atlv, as an array of those three fields, using the tag 1. The references must be listed in lexicographic order. As an example, a 128 byte ciphertext with a single reference to another braid and a single parent:

offsetbytesdescription
0081the tag (1) of a version
0143the array header (3 elements)
02c1 00the binary header for the ciphertext (128 bytes)
03…82...the ciphertext
8341the array header of the references (1 element)
8482the tag (2) of a Braid Public Key Reference
8581the tag (1) of a Schnorr-Ristretto255-Blake3 public key
8620the binary header for the public key (32 bytes)
87…a6...the content of the Schnorr-Ristretto255-Blake3 public key
a741the array header of the parents (1 element)
a881the tag (1) of a Schnorr-Ristretto255-Blake3 signature
a930the binary header for the signature key (48 bytes)
aa…d9...the content of the Schnorr-Ristretto255-Blake3 signature

References

There are two types of references to versions:

  • "Version Signature References" are produced by initializing a [deterministic signature] with a private key and the domain "Version Node Subscriptions: Reference: Version: Signature", feeding in the ciphertext, demarcating, feeding the serialized list of references, feeding the serialized list of parents, and finalizing.

  • "Braid Public Key References" are the public key corresponding to the public key used to sign the set of versions.

Summaries

Version Summaries are records of versions that elide the contained data.

#![allow(unused)]
fn main() {
struct VersionSummary {
  state: VerifyState,
  references: ReferenceSet,
  parents: Vec<VersionSignature>,
}
}

The verification state for a Version Summary is produced by initializing a [deterministic signature] with a public key and the domain "Version Node Subscriptions: Reference: Version: Signature", feeding in the ciphertext, and then extracting the state,

The reference for a Version may be verified from its Summary, by injecting the preserved verification state, feeding the serialized list of references, feeding the serialized list of parents, and finalizing.

Cryptography

Encryption

The ciphertext held within a Version is produced with using deterministic authenticated encryption with associated data, with a domain of "Versioned Node Subscriptions: Version Encryption". The associated data is the concatenation of:

  1. the serialized public key of the braid
  2. the serialized set of references
  3. the serialized list of parent versions

(In the above example, the second two correspond to the bytes 0x83 to the end.)

Analysis

Deterministic authenticated encryption with associated data is subject to chosen plaintext attacks, however, this is significantly mitigated by the entire history of the version being represented in the associated data. In order to identify a ciphertext, the attacker must have access to a different oracle for each guess at the plaintext.

Finals

Finals indicate the final version of a braid, and a successor braid. There may only be a single final for each braid, and they require special user interaction to create. If there are multiple finals for a braid, something has gone wrong. One of them is very likely a forgery, but there is no way to tell which one automatically.

#![allow(unused)]
fn main() {
struct Final {
  data: Ciphertext,
  version: VersionSignature,
  successor: BraidPublicKey,
  commitment: FinalCommitment,
  authority: RotationPublicKey,
  signature: RotationSignature,
}
}

A final may specify the public key of its braid as its own successor.

Serialization

Finals are serialized using atlv, as an array of those three fields, using the tag 2. As an example, a 128 byte cipher text for a final with all generation 0 cryptography:

offsetbytesdescription
0082the tag (2) of a final
0146the array header (6 elements)
02c1 00the binary header for the ciphertext (128 bytes)
03…82...the ciphertext
8380the tag (0) of a Schnorr-Ristretto255-Blake3 signature
8430the binary header for the public signature (48 bytes)
85…b4...the content of the Schnorr-Ristretto255-Blake3 signature
b580the tag (0) of a Schnorr-Ristretto255-Blake3 public key
b620the binary header for the public key (32 bytes)
b7…d6...the content of the Schnorr-Ristretto255-Blake3 public key
d780the tag (0) of a Schnorr-Ristretto255-Blake3 public key
d820the binary header for the public commitment key (32 bytes)
d9…f8...the content of the Schnorr-Ristretto255-Blake3 public commitment key
f980the tag (0) of a Schnorr-Ristretto255-Blake3 public key
fa20the binary header for the public commitment key (32 bytes)
fb…11a...the content of the Schnorr-Ristretto255-Blake3 public commitment key
11b80the tag (0) of a Schnorr-Ristretto255-Blake3 signature
11c30the binary header for the public signature (48 bytes)
11d…14c...the content of the Schnorr-Ristretto255-Blake3 signature

References

Finals are fetched by the Braid Public Key Reference, which can be derived from the commitment and authority, as specified by the signature scheme.

Cryptography

Cryptography is delicate.

We aim to use robust and efficient components, with mandatory domain separation and a lack of common gotchas.

Unfortunately most conventional components are XChaCha20-Poly1305 doesn't provide key commitment, Ed25519 has problems with malleability, neither provides a standard way to do domain separation.

The cryptography is organized into generations, sets of implementations which fulfill the component requirements. To reduce code bloat each generation should be implemented with a minimal set of primitives.

There is currently one generation, built from these primitives:

  • Blake3 for data hashing, authenticity, and key derivation.
  • XChaCha8 for encryption.
  • Ristretto255 for public key operations.

Stateful Hash Objects

Stateful Hash Objects are used in many places one would otherwise specify hash functions. However, they have a few more capabilities and requirements than a plain hash function. The ones used here are based on Trevor Perrin's Stateful Hash Functions, but with the addition of extraction and injection, and stricter requirements for demarc/ratchet.

Operations

  • Initialize(domain): Creates SHO, such that passing in a different domain results in a different SHO.

  • Clone(): Duplicates the current state of SHO into a new SHO.

  • Feed(input): This inserts data into the SHO.

  • Extract(): Extract a summary of the current input. This output must not be derivable from the output of Crunch at the same state, nor vice-versa.

  • Inject(state): Injects a summary of some input, and resume with a fresh minimal object.

  • Crunch(): Produce a summary of the current input. This output must not be derivable from the output of Extract at the same state, nor vice-versa.

  • Demarc(): Creates an out-of-band demarcation point in the input stream, such that no possible input can collide with it. This must be equivalent to Extract followed by Inject.

Serialization

Stateful Hash Object output hashes are serialized with a 0 tag followed by the serialization of the contained hash.

Stateful Hash Object extractions are serialized with a 32 tag followed by the serialization of the contained extraction.

Implementations

Blake3

The stateful hash object for generation 1 is Blake3, a very fast hash function, PRF, MAC, KDF, and XOF that is secure against length extension.

  • Initialization gives the domain to the 'derived_key' function.

  • Clone simply duplicates the hasher.

  • Feed gives the input to the 'update' function

  • Extract uses the finalize_xof function, seeks 64 bytes in, and returns 32 bytes of output.

  • Inject creates a new hasher with the keyed function.

  • Crunch uses the finalize_xof function, and returns the first 32 bytes of output.

Blake3 hashes and extractions are serialized with the 0 tag, followed by a 32 byte binary.

Deterministic Authenticated Encryption with Associated Data

These encryption algorithms are required to provide:

  • determinism: given the same inputs, the same outputs result.

  • divergence: if any bit of any input differs, the outputs must be unrelated.

  • key commitment: only one key will successfully decrypt the ciphertext.

  • associated data commitment: the associated data must match to decrypt the ciphertext.

  • conditional chosen plaintext attack resistance: if associated data varies and is not under control of the attacker, the ciphertexts are indistinguishable under adaptive chosen plaintext attacks.

Operations

  • FromMaster(domain, master_key): Derive a shared encryption key from a master key.

  • FromPlaintext(domain, convergence_domain, plaintext, associated_data): Derive a shared encryption key from the plaintext.

  • Encrypt(domain, shared_key, plaintext, associated_data): Encrypt a plaintext with a given key and domain, in the context of the given data.

  • Decrypt(domain, shared_key, ciphertext, associated_data): Decrypt a cipher with a given key and domain, in the context of the given data. If the domain, key, or associated data do not go with the ciphertext, fail.

Implementations may also provide a combination FromPlaintext and Encrypt that returns both the shared key and ciphertext.

Serialization

Shared Keys are serialized with an 8 tag followed by the serialization of the contained key.

Implementations

XChaCha8-Blake3-SIV

The DAEAD for generation 1 is an encrypt+MAC scheme constructed from the Blake3 Stateful Hash Object and the extended nonce ChaCha construction with 8 rounds. This construction is crafted to allow convergent encryption in two passes, in which the first pass derives the shared key and IV, the second pass encrypts.

  • FromMaster invokes Blake3 with the domain "XChaCha8-Blake3-SIV: Derivation From Master Key", feeds in the provided domain, demarcates, feeds in the master key, finalizes.

  • FromPlaintext invokes Blake3 with the domain "XChaCha8-Blake3-SIV: Derivation From Plaintext", feeds in the provided domain, demarcates, feeds in the plaintext, demarcates, feeds in the associated data, demarcates, feeds in "shared key generation", feeds in the convergence domain, finalizes.

  • Encrypt:

    • Derives the IV by invoking Blake3 with the domain "XChaCha8-Blake3-SIV: Derivation From Plaintext", feeds in the provided domain, demarcates, feeds in the plaintext, demarcates, feeds in the associated data, demarcates, feeds in "initialization vector generation", feeds in the shared key, finalizes, truncates to 24 bytes.
    • Derives the encryption key from the shared key by invoking Blake3 with the domain "XChaCha8-Blake3-SIV: Encryption Key Derivation", feeds in the shared key, finalizes.
    • Emits the IV, and then applies XChaCha8 to the plaintext.
  • Decrypt:

    • Derives the encryption key from the shared key by invoking Blake3 with the domain "XChaCha8-Blake3-SIV: Encryption Key Derivation", feeds in the shared key, finalizes.
    • Splits the IV off the front of the ciphertext.
    • Applies XChaCha8 to the ciphertext.
    • Derives the test IV by invoking Blake3 with the domain "XChaCha8-Blake3-SIV: Derivation From Plaintext", feeds in the provided domain, demarcates, feeds in the plaintext, demarcates, feeds in the associated data, demarcates, feeds in "initialization vector generation", feeds in the shared key, finalizes, truncates to 24 bytes.
    • Checks the IV against the test IV, if equal emits the ciphertext, otherwise errors.

Deterministic Signatures with Key Rotation

Deterministic Signatures are used to identify versions. As the name suggests, two signatures with the same key and of the same value should result in the same signature. However, it is not required that there is only one possible valid signature, just that there is a convention of creating the same signature.

The signed data is fed into the signer/verifier like a SHO, and most schemes will use a SHO internally. This allows for signatures to be verified on a summary as well as on a complete input.

In addition to strong existential unforgeability of messages and signatures for a given public key, we require existential unforgeability of messages and public keys for a given signature. This prevents forging a version with a particular signature.

Operations

  • Initialize(domain): Creates the signer-verifier. Passing in a different domain will result in a different signature.

  • Clone(): duplicates the signer-verifier.

  • Feed(input): This inserts data into the signer-verifier.

  • Extract(): Extract a summary of the current input.

  • Inject(state): Creates a signer-verifier from the extracted state.

  • Demarc(): Creates an out-of-band demarcation point in the input stream, such that no possible input can collide with it. This must be equivalent to Extract followed by Inject.

  • Sign(secret_key, public_key): Create a signature using the given secret key for a particular public key. Typically, the public key will be the one corresponding to the secret key, but a wrapped threshold scheme may target a different public key.

  • Verify(public_key, signature): Check the current input against a public key and signature.

  • Derive(master_key, rotation_key): Derive a secret key and a commitment, from a master key and a public rotation key.

Serialization

Signatures are serialized with a 1 tag followed by the serialization of the contained signature. Public Keys are serialized with a 2 tag followed by the serialization of the contained key. Secret Keys are serialized with a 3 tag followed by the serialization of the contained key. Signer-Verifier extractions are serialized with a 33 tag followed by the serialization of the contained extraction.

Implementations

Schnorr-Ristretto255-Blake3

Index Trees

Index trees map a totally-ordered key space to an arbitrary value space, they may be thought of as B-trees, though the format doesn't require them to be self-balancing.

Encoding

The root node is a version and each other node is a blob. Each node holds a list of (greatest key, link) or (key, value) tuples, sorted by the key or greatest key. Links are the shared key and reference of a child node, and the value may be encoded directly or reference other nodes holding the indexed data.

Security

Observation

This is the typical case for a backed up database, or a node participating in synchronization. The attacker is assumed to be able to see the entire history of the encrypted index tree, but has no control over its contents.

Given the relatively large and consistent fan-out, the attacker can likely distinguish an index tree from other types of data. With knowledge of the application, this leaks the approximate number of keys, by the number and size of leaf nodes. The quality of this estimate may be reduced by values that reference blobs that also have similar fan-out.

While this attacker cannot associate a particular node with a set of keys (even relative to other keys), they can watch changes to similar areas over time, especially if there's a commit after every change. Batching changes muddies this leakage, one to one mappings become n to m.

Entry Modification

If the attacker can update the non-indexed part of a value and knows the key to that value, or if they can insert a value with a known key, they can locate the area of the key.

The more values an attacker can control this way, the more they insight they can gain into the structure of the data. With arbitrary insertion or modification an attacker can efficiently find the approximate boundaries of each node.

This is likewise mitigated by batching changes, though not eliminated if the attacker knows which changes contained updates to the same key.

External Reference Knowledge

If the index tree values include external references, related values may be determined by having nearby keys. This is especially relevant to file system use cases, as similar files are likely to be stored together, and files are likely to be external values.

Measure Trees

Measure trees map a totally-ordered measure space with an associative operation to value space which the measure is derived from. They are similar to finger trees, but their topology may be arbitrary.

The leaves of the tree hold the values, and each branch of the tree holds a list of edges to either branches or leaves, along with the measures of the linked nodes.

Encoding

The root branch is a version and other branches are blobs. Each branch holds a list of (measure, link) or (measure, value) tuples. The leaves may be the values directly, or the values may reference other blobs. Values should not reference braids unless the measure is independent of the content of the braid.

Examples

Random access into arbitrarily large byte sequences may be efficiently achieved with measure trees. The leaves are blobs holding raw byte sequences, measured by their size. A particular position of the byte sequence may be found by accumulating the sizes of each item in the root node until the accumulated measure at an item exceeds the start of the range. If a branch, descend into that item and repeat, if a leaf, descend into the leaf and seek to the appropriate offset.