Index Trees

Index trees map a totally-ordered key space to an arbitrary value space, they may be thought of as B-trees, though the format doesn't require them to be self-balancing.

Encoding

The root node is a version and each other node is a blob. Each node holds a list of (greatest key, link) or (key, value) tuples, sorted by the key or greatest key. Links are the shared key and reference of a child node, and the value may be encoded directly or reference other nodes holding the indexed data.

Security

Observation

This is the typical case for a backed up database, or a node participating in synchronization. The attacker is assumed to be able to see the entire history of the encrypted index tree, but has no control over its contents.

Given the relatively large and consistent fan-out, the attacker can likely distinguish an index tree from other types of data. With knowledge of the application, this leaks the approximate number of keys, by the number and size of leaf nodes. The quality of this estimate may be reduced by values that reference blobs that also have similar fan-out.

While this attacker cannot associate a particular node with a set of keys (even relative to other keys), they can watch changes to similar areas over time, especially if there's a commit after every change. Batching changes muddies this leakage, one to one mappings become n to m.

Entry Modification

If the attacker can update the non-indexed part of a value and knows the key to that value, or if they can insert a value with a known key, they can locate the area of the key.

The more values an attacker can control this way, the more they insight they can gain into the structure of the data. With arbitrary insertion or modification an attacker can efficiently find the approximate boundaries of each node.

This is likewise mitigated by batching changes, though not eliminated if the attacker knows which changes contained updates to the same key.

External Reference Knowledge

If the index tree values include external references, related values may be determined by having nearby keys. This is especially relevant to file system use cases, as similar files are likely to be stored together, and files are likely to be external values.