crdt-reference.md markdown

475 lines 14.8 KB

sha256:248464b6a2f758985cbef90f864fa62c61842be699d975d6e00b6a9509ef919c fix(delta): detect blob-identical file renames for files wi… Sonnet 4.6 patch 25 days ago

Muse CRDT Reference

Audience: Plugin authors adding CRDT Semantics to their domain plugin, or anyone curious about how Muse achieves conflict-free convergence.

Implementation: muse/core/crdts/ — six primitives, each with full type safety, to_dict/from_dict serialisation, and tested lattice-law compliance.

Why CRDTs?
The Three Lattice Laws
Primitive Reference
- VectorClock
- LWWRegister
- ORSet
- RGA
- AWMap
- GCounter
Combining Primitives
CRDTPlugin Integration
Performance Notes
When Not to Use CRDTs

Why CRDTs?

In classical three-way merge (Muse Phases 1–3), two branches that both edit the same element produce a conflict that a human must resolve. This is correct and desirable for human-paced collaborative editing — the human has an opinion and should make the final call.

But consider a different scenario: twenty automated agents simultaneously annotating a genome, or a distributed sensor network writing telemetry, or a DAW plugin streaming real-time automation changes from multiple collaborators. In these cases:

Conflicts are too frequent to be individually resolvable
No human is present to arbitrate
The agents don't coordinate in real time
Messages may be delayed, reordered, or duplicated

CRDTs (Conflict-free Replicated Data Types) solve this by changing the definition of a "write." Instead of "replace the current value," each write is a join on a partial order — the state space is a lattice, and the merge of any two states is always the least upper bound of both.

The result: join always converges to the same final state, regardless of message order, delay, or duplication. No conflict state ever exists.

The Three Lattice Laws

Every CRDT join operation must satisfy all three laws. Muse's test suite verifies them for all six primitives. If you build a composite CRDT from these primitives, your join inherits these properties automatically (lattice composition is closed under these laws).

1. Commutativity

a.join(b) ≡ b.join(a)

The order in which you receive updates doesn't matter. Agent A sending its state to B produces the same result as B sending to A first.

2. Associativity

a.join(b.join(c)) ≡ (a.join(b)).join(c)

You can batch updates in any grouping. Receiving 10 updates one at a time is equivalent to receiving them all batched, or in any intermediate grouping.

3. Idempotency

a.join(a) ≡ a

Receiving the same update twice is harmless. Deduplication is not required.

These three laws together mean: no matter how your network behaves — delays, reorders, duplicates — all replicas eventually reach the same state once they have seen all updates.

Primitive Reference

All primitives are in muse/core/crdts/. Import them from the package:

from muse.core.crdts import VectorClock, LWWRegister, ORSet, RGA, AWMap, GCounter

All primitives are immutable — every mutating method returns a new instance. This makes them safe to use as values in TypedDict fields and easy to test.

VectorClock

File: muse/core/crdts/vclock.py

A vector clock assigns a logical timestamp to each agent. It answers "does event A causally precede event B?" without requiring a synchronized wall clock.

Use for: tracking causal ordering between agents. Required by all other CRDTs implicitly when you need to know which write was "more recent."

API

vc = VectorClock()                    # empty — all agents at tick 0

# Increment agent's own clock before a write
vc2 = vc.increment("agent-1")        # {"agent-1": 1}

# Merge with a clock received from another agent
merged = vc.merge(other_vc)          # take max per agent

# Causal comparison
vc_a.happens_before(vc_b)            # True if vc_a ≤ vc_b (strictly before)
vc_a.concurrent_with(vc_b)           # True if neither precedes the other
vc_a.equivalent(vc_b)                # True if all per-agent ticks are equal

# Serialisation
d = vc.to_dict()                     # {"agent-1": 1, "agent-2": 3}
vc3 = VectorClock.from_dict(d)

When to use

Always embed a VectorClock in your CRDTSnapshotManifest["vclock"] field. It tracks which writes from which agents have been seen, enabling you to detect concurrent writes and apply correct merge semantics.

LWWRegister

File: muse/core/crdts/lww_register.py

A register holding a single value. When two agents write concurrently, the one with the higher timestamp wins ("Last Write Wins").

Use for: scalar values where recency is the right semantic — tempo, a mode enum, a display name, a configuration flag. Not appropriate when concurrent writes represent genuinely independent work that should both be preserved.

API

reg: LWWRegister[float] = LWWRegister()

# Write a new value (timestamp should be monotonically increasing per agent)
reg2 = reg.write(120.0, timestamp=1700000000.0, author="agent-1")

# Read current value
val = reg2.read()              # 120.0

# Join two registers — higher timestamp wins
merged = reg2.join(other_reg)

# Serialisation
d = reg.to_dict()              # {"value": ..., "timestamp": ..., "author": ...}
reg3 = LWWRegister[float].from_dict(d)

reg2.equivalent(reg3)          # True if same value/timestamp/author

Warning on timestamps

LWW correctness depends on timestamps being reasonably monotone. In a distributed system with clock skew, use logical timestamps (derived from a VectorClock) rather than wall time.

ORSet

File: muse/core/crdts/or_set.py

An Observed-Remove Set. Elements can be added and removed, but concurrent adds win over concurrent removes. This is the opposite of a naive set where removes win.

Why adds-win? In collaborative scenarios, a concurrent remove means "I didn't know you were going to add that" — not "I decided to delete your add." Adds-win semantics prevent silent data loss.

Use for: annotation sets, tag collections, member lists, gene ontology terms, feature flags — any unordered collection where concurrent adds should be preserved.

API

s: ORSet[str] = ORSet()

# Add with a unique token (random hex or agent+timestamp combination)
s2 = s.add("annotation-GO:0001234", token="agent1-tick42")

# Remove by value (removes all tokens for that element)
s3 = s2.remove("annotation-GO:0001234")

# Query
"annotation-GO:0001234" in s2        # True
s2.elements()                        # frozenset({"annotation-GO:0001234"})
s2.tokens_for("annotation-GO:0001234")  # frozenset({"agent1-tick42"})

# Join — union of all add-tokens, then subtract remove-tokens
merged = s2.join(other_set)

# Serialisation
d = s.to_dict()
s4 = ORSet[str].from_dict(d)

s2.equivalent(s3)               # True if same elements and tokens

Concurrent add + remove example

Agent A: s.add("X", token="a1")
Agent B: s.remove("X")        (before seeing A's add)

After join:
  A's add token "a1" is present
  B's remove only targets tokens B has seen — not "a1"
  Result: "X" is in the merged set ✓

RGA

File: muse/core/crdts/rga.py

A Replicated Growable Array — a list where concurrent insertions are commutative. Two agents can insert at the same logical position; the result is deterministic based on element_id ordering (larger ID appears first).

Use for: collaborative text editing, ordered note sequences, ordered event streams, any sequence where multiple agents might insert concurrently.

API

rga: RGA[str] = RGA()

# Insert after the virtual root (parent_id=None means "at the beginning")
rga2 = rga.insert(value="C4", element_id="id-100", parent_id=None)
rga3 = rga2.insert(value="D4", element_id="id-200", parent_id="id-100")
rga4 = rga3.insert(value="E4", element_id="id-300", parent_id="id-200")

# Delete by element_id (tombstones the element, does not shift IDs)
rga5 = rga4.delete("id-200")

# Read current sequence (tombstoned elements excluded)
rga4.to_sequence()    # ["C4", "D4", "E4"]
rga5.to_sequence()    # ["C4", "E4"]

len(rga4)             # 3

# Join — builds parent-ID tree, traverses in canonical order
merged = rga4.join(other_rga)

# Serialisation
d = rga.to_dict()
rga6 = RGA[str].from_dict(d)

rga4.equivalent(rga6)    # True if same elements in same order

How concurrent insertions resolve

Initial: ["A", "C"]  (A at id-100, C at id-300)

Agent 1: inserts "B" at id-200, parent_id="id-100"
Agent 2: inserts "X" at id-250, parent_id="id-100"

After join (same parent "id-100", id-250 > id-200):
  Result: ["A", "X", "B", "C"]
  (larger element_id appears first among siblings)

To get a specific ordering, choose element_id values accordingly. For sequential insertions from a single agent, monotonically increasing IDs produce the expected order.

AWMap

File: muse/core/crdts/aw_map.py

An Add-Wins Map. A dictionary where concurrent adds win over concurrent removes, and each key is managed independently (adding a key does not conflict with removing a different key).

Use for: parameter maps, configuration dicts, per-dimension metadata, named dimension states, any key-value structure where concurrent writes to different keys should not conflict.

API

m: AWMap[str, float] = AWMap()

# Set a key-value pair (uses an ORSet internally per key)
m2 = m.set("tempo", 120.0, token="agent1-t1")
m3 = m2.set("key_sig", 0.0, token="agent1-t2")

# Remove a key
m4 = m3.remove("key_sig")

# Query
m3.get("tempo")          # 120.0
m3.get("missing")        # None
"tempo" in m3            # True
m3.keys()                # frozenset({"tempo", "key_sig"})

# Flatten to plain dict
m3.to_plain_dict()       # {"tempo": 120.0, "key_sig": 0.0}

# Join — union of all add-sets per key, removes applied per key
merged = m3.join(other_map)

# Serialisation
d = m.to_dict()
m5 = AWMap[str, float].from_dict(d)

m3.equivalent(m4)        # True if same key-value pairs

GCounter

File: muse/core/crdts/g_counter.py

A grow-only counter. Each agent increments its own shard; the global value is the sum. Decrement is not possible — this is by design. Counters that can only grow are trivially convergent.

Use for: event counts, version numbers, message sequence numbers, commit counts, read counts — any monotonically increasing quantity.

API

gc = GCounter()

# Increment this agent's shard
gc2 = gc.increment("agent-1")
gc3 = gc2.increment("agent-1")
gc4 = gc3.increment("agent-2")

gc4.value()                   # 3
gc4.value_for("agent-1")      # 2
gc4.value_for("agent-2")      # 1
gc4.value_for("agent-99")     # 0

# Join — take max per agent shard
merged = gc4.join(other_counter)

# Serialisation
d = gc.to_dict()              # {"agent-1": 2, "agent-2": 1}
gc5 = GCounter.from_dict(d)

gc4.equivalent(gc5)           # True if same per-agent values

Combining Primitives

Complex CRDT state is built by composing primitives. The composition inherits the lattice laws because each primitive satisfies them and because join is applied field-by-field.

Example: a collaborative score header

@dataclass
class ScoreHeaderCRDT:
    """Convergent score header: tempo register + time_sig register + author set."""

    tempo: LWWRegister[float]
    time_sig: LWWRegister[str]
    authors: ORSet[str]

    def join(self, other: ScoreHeaderCRDT) -> ScoreHeaderCRDT:
        return ScoreHeaderCRDT(
            tempo=self.tempo.join(other.tempo),
            time_sig=self.time_sig.join(other.time_sig),
            authors=self.authors.join(other.authors),
        )

Because LWWRegister.join and ORSet.join both satisfy the three laws, ScoreHeaderCRDT.join does too — for free, by composition.

CRDTPlugin Integration

The entry point in the core engine is crdt_join_snapshots() in muse/core/merge_engine.py. The muse merge command calls it when isinstance(plugin, CRDTPlugin) is True:

from muse.core.merge_engine import crdt_join_snapshots
from muse.domain import CRDTPlugin, CRDTSnapshotManifest

# In merge_engine.py — called by the merge command
def crdt_join_snapshots(
    plugin: CRDTPlugin,
    ours: StateSnapshot,
    theirs: StateSnapshot,
) -> MergeResult:
    crdt_a = plugin.to_crdt_state(ours)
    crdt_b = plugin.to_crdt_state(theirs)
    joined = plugin.join(crdt_a, crdt_b)
    merged_snapshot = plugin.from_crdt_state(joined)
    return MergeResult(
        merged=merged_snapshot,
        conflicts=[],          # always empty — CRDT join never conflicts
        applied_strategies={},
        dimension_reports={},
    )

Notice conflicts=[] is always empty. This is the CRDT guarantee: no human intervention is ever required.

Performance Notes

Primitive	Join complexity	Storage
`VectorClock`	O(agents)	One int per agent
`LWWRegister`	O(1)	One value + timestamp
`ORSet`	O(n + m) tokens	One random hex token per add operation
`RGA`	O(n log n)	One node per insert (tombstones retained)
`AWMap`	O(keys × tokens)	Per-key ORSet overhead
`GCounter`	O(agents)	One int per agent

RGA memory warning: RGA retains tombstoned elements forever (this is required for commutativity). For domains with high churn (many inserts and deletes), implement periodic garbage collection by taking a snapshot of the live sequence, creating a fresh RGA, and re-inserting only the live elements. This is a safe operation because garbage collection only affects elements both sides have observed as deleted — a coordination-free safe point.

When Not to Use CRDTs

CRDTs are not always the right choice. Use three-way merge (Phases 1–3) when:

Humans are making creative decisions — a DAW producer choosing a chord voicing should not have their choice silently overwritten by a LWW timestamp. Use OT merge with conflicts.
The domain has invariants that CRDTs cannot enforce — CRDTs converge, but they can produce semantically invalid states. A MIDI file with notes outside the pitch range 0–127 is technically convergent but musically invalid. Invariant enforcement requires coordination.
Conflict visibility is a feature — in code review, you want conflicts to be visible to humans. "This merge is clean" is meaningful precisely because conflicts exist.
You have a clear authority model — if one agent is the "source of truth," LWW with that agent always winning is fine. But that's a policy, not a CRDT.

Use CRDTs when all of the following are true:

Many agents write concurrently (more than humans can coordinate)
No single agent is the authority
All writes are semantically valid in isolation
Convergence is more important than precision

File History 1 commit