Developer Docs Agent Coordination
PHASE 06

Agent Coordination

Muse's coordination layer lets multiple agents work a shared codebase in parallel without stepping on each other. It provides file-backed work queues with POSIX-atomic claiming, advisory symbol reservations with heartbeat keep-alive, a dependency DAG for ordering, import-graph sharding for zero-interference parallelism, and a three-pass conflict forecast before a single line of code changes.

File-backed, no broker required — all coordination state lives in .muse/coordination/ as content-addressed JSON. Remote sync to MuseHub is optional; everything works offline.

Overview

Two independent primitives compose the system:

PrimitivePurposeAtomicity
Task queue Distribute discrete units of work across agents. Producer enqueues; consumers race to claim; winner completes or fails. open(O_CREAT|O_EXCL) — POSIX-guaranteed single winner
Reservations Advisory symbol-address leases. Agents declare intent before touching a symbol; muse coord forecast detects overlaps before they happen. write_text_atomic() (atomic rename); content-addressed ID prevents duplicates

A typical multi-agent pipeline:

# Orchestrator shards the repo into N parallel zones
muse coord shard --agents 4 --json

# Orchestrator enqueues one task per shard
muse coord enqueue "Refactor shard 1" --queue refactor \
  --payload '{"files":["src/billing.py","src/models.py"]}' \
  --run-id orchestrator --json

# Each worker claims one task
muse coord claim --queue refactor --run-id agent-1 --json

# Worker reserves the symbols it will touch
muse coord reserve "src/billing.py::compute_total" \
  --run-id agent-1 --op modify --ttl 7200 --json

# Forecast before any writes
muse coord forecast --json

# Work … commit … then release and complete
muse coord release <reservation_id> --run-id agent-1 --json
muse coord complete <task_id> --run-id agent-1 --json

Multi-agent scenario — orchestrator + 3 workers

This scenario shows a complete pipeline: one orchestrator shards the repo, enqueues three tasks, three workers race to claim them, a forecast catches a symbol overlap before any code changes, and the orchestrator collects all results.

Step 1 — shard and enqueue

bash orchestrator — shard into 3 zones, enqueue 3 tasks
# 1. Split the import graph into 3 parallel-safe zones
muse coord shard --agents 3 --json \
  | python3 -c "import sys,json; [print(s['shard'], s['files']) for s in json.load(sys.stdin)['shards']]"
# 1 ['src/billing.py', 'src/models.py']
# 2 ['src/auth.py', 'src/tokens.py']
# 3 ['src/api.py', 'src/middleware.py']

# 2. Enqueue one task per shard
muse coord enqueue "Refactor shard 1" --queue refactor --priority 10 \
  --payload '{"files":["src/billing.py","src/models.py"]}' \
  --run-id orchestrator --json
muse coord enqueue "Refactor shard 2" --queue refactor --priority 10 \
  --payload '{"files":["src/auth.py","src/tokens.py"]}' \
  --run-id orchestrator --json
muse coord enqueue "Refactor shard 3" --queue refactor --priority 10 \
  --payload '{"files":["src/api.py","src/middleware.py"]}' \
  --run-id orchestrator --json

Step 2 — workers race to claim

bash three workers run this concurrently — each gets a different task
# Each worker claims from the refactor queue (POSIX-atomic, one winner per task)
TASK=$(muse coord claim --queue refactor --run-id agent-1 --json)
TASK_ID=$(echo "$TASK" | python3 -c "import sys,json; print(json.load(sys.stdin)['task_id'])")
FILES=$(echo "$TASK" | python3 -c "import sys,json; print(json.load(sys.stdin)['payload']['files'])")
json agent-1 claim response
{
  "task_id":          "sha256:4e8b2f1a3c7d...",
  "title":            "Refactor shard 1",
  "queue":            "refactor",
  "claimer_run_id":   "agent-1",
  "claimed_at":       "2026-04-21T16:00:00Z",
  "expires_at":       "2026-04-21T17:00:00Z",
  "payload":          { "files": ["src/billing.py", "src/models.py"] }
}

Step 3 — reserve symbols, run forecast

bash each worker reserves before touching — agent-1 and agent-2 both reserve compute_total
# agent-1: billing shard
muse coord reserve "src/billing.py::compute_total" \
  --run-id agent-1 --op modify --ttl 7200 --json

# agent-2: auth shard — also touches compute_total (import chain)
muse coord reserve "src/billing.py::compute_total" \
  --run-id agent-2 --op rename --ttl 7200 --json

# orchestrator: forecast before any agent writes code
muse coord forecast --json
json forecast — overlap detected
{
  "active_reservations": 3,
  "call_graph_available": true,
  "partial_forecast":    false,
  "conflicts": [
    {
      "conflict_type": "address_overlap",
      "addresses":    ["src/billing.py::compute_total"],
      "agents":       ["agent-1@feat/refactor", "agent-2@feat/auth"],
      "confidence":   1.0,
      "description": "Symbol reserved by 2 agents on different branches"
    },
    {
      "conflict_type": "operation_conflict",
      "addresses":    ["src/billing.py::compute_total"],
      "agents":       ["agent-1@feat/refactor", "agent-2@feat/auth"],
      "confidence":   0.9,
      "description": "Incompatible operations: modify vs rename on same symbol"
    }
  ],
  "high_risk":   2,
  "medium_risk": 0,
  "low_risk":    0
}

Orchestrator sees the conflict and serializes the work: agent-1 renames first, agent-2's reservation is updated to depends-on agent-1's:

bash orchestrator resolves the overlap — serialize via depends-on
# Agent-2 releases its reservation and re-reserves with depends-on
muse coord release <agent-2-reservation-id> --run-id agent-2 --json
muse coord reserve "src/billing.py::compute_total" \
  --run-id agent-2 --op modify --ttl 7200 \
  --depends-on <agent-1-reservation-id> --json

Step 4 — work, commit, release, complete

bash each worker — do work, commit, then release and complete
# ... agent does work on its branch ...
muse commit -m "refactor: shard 1 billing module" \
  --agent-id claude-code --model-id claude-sonnet-4-6 --sign

# Release all reservations for this run
muse coord release --all-for-run agent-1 --run-id agent-1 --json

# Complete the task with a structured result
muse coord complete sha256:4e8b2f1a3c7d... \
  --run-id agent-1 \
  --result '{"symbols_modified": 12, "tests_passing": true}' \
  --json

Step 5 — orchestrator collects results

bash orchestrator — wait for all tasks to complete
muse coord tasks --queue refactor --status completed --json \
  | python3 -c "
import sys, json
data = json.load(sys.stdin)
for t in data['tasks']:
    print(t['title'], '→', t['result'])
"
text output
Refactor shard 1 → {'symbols_modified': 12, 'tests_passing': True}
Refactor shard 2 → {'symbols_modified': 8,  'tests_passing': True}
Refactor shard 3 → {'symbols_modified': 5,  'tests_passing': True}

Work queues

Tasks are content-addressed by their inputs — same title + queue + payload + priority + creator always produces the same task_id. Queues are named strings; any alphanumeric-plus-hyphen-or-underscore name up to 64 chars is valid.

TaskRecord fields

FieldTypeDescription
task_idstr (sha256:…)Content-addressed; same inputs → same ID (idempotent enqueue)
titlestr ≤256Short human-readable description of the work unit
payloaddictArbitrary JSON — agent-specific data passed to the claimer
priorityintHigher = claimed first; ties broken by created_at FIFO
queuestr ≤64Queue name [a-zA-Z0-9_-]+; default "default"
created_atdatetime (UTC)Enqueue timestamp
created_bystr ≤256run_id of the enqueuing agent
ttl_secondsintSeconds from created_at before pending task expires; default 86400 (24h)
tagslist[str]Up to 32 tags, each ≤64 chars; for filtering and grouping
muse coord enqueue "Lint billing module" \
  --queue lint \
  --priority 10 \
  --payload '{"file": "src/billing.py"}' \
  --ttl 3600 \
  --tags billing python \
  --run-id orch --json

Claim lifecycle

Task state is derived from two files: the task record (immutable) and the claim record (mutable). The claim file is created with open(O_CREAT|O_EXCL) — POSIX guarantees exactly one agent wins the race; all others get FileExistsError.

pending ──[O_CREAT|O_EXCL, one winner]──→ claimed
                                               │
                   ┌───────────────────────────┼────────────────┐
                   ↓                           ↓                ↓
             completed                      failed          cancelled
            (result set)               (error set)

           ↑ Re-claim path (timed_out):
           claimed ──[expires_at ≤ now]──→ timed_out ──[write_atomic + nonce]──→ claimed

ClaimRecord fields

FieldTypeDescription
task_idstr (sha256:…)Links to the immutable TaskRecord
claimer_run_idstr ≤256Agent that claimed the task
claimed_atdatetime (UTC)Initial claim timestamp
expires_atdatetime (UTC)Claim TTL; default 3600s; extended by heartbeat
statusstr"claimed" "completed" "failed" "cancelled" "timed_out"
heartbeat_atdatetime (UTC)Last heartbeat timestamp
claim_noncestrOptimistic concurrency token for re-claim race resolution
resultdict | nullSet by muse coord complete --result JSON
errorstr | nullSet by muse coord fail-task --error MSG

Claiming and re-claiming

muse coord claim scans the queue sorted by priority descending, then created_at ascending (FIFO within a priority band). It tries each candidate in order:

  1. Pending taskopen(O_CREAT|O_EXCL). POSIX-atomic. First agent to call wins; rest loop to the next candidate.
  2. Timed-out taskwrite_text_atomic with a fresh claim_nonce, then read back. If the nonce matches, the agent won the re-claim race. If not, another agent beat it; loop to the next candidate.
muse coord claim --queue refactor --run-id agent-1 --json

muse coord complete <task_id> --run-id agent-1 \
  --result '{"symbols_renamed": 7}' --json

muse coord fail-task <task_id> --run-id agent-1 \
  --error "AST parse failed on line 42" --json

muse coord cancel-task <task_id> --run-id orch --json

muse coord tasks --queue refactor --status pending --json
Only the claimer_run_id may call complete, fail-task, or heartbeat on a claim. Orchestrators use cancel-task.

Reservations

A reservation is an advisory symbol-address lease. It does not prevent writes (no lock server required) — it signals intent so muse coord forecast can detect overlaps before they produce merge conflicts.

ReservationRecord fields

FieldTypeDescription
reservation_idstr (sha256:…)Content-addressed: SHA-256(run_id, branch, sorted(addresses), operation)
run_idstr ≤256Agent identifier — use your pipeline ID
branchstrBranch this agent is working on
addresseslist[str]Symbol addresses; fnmatch glob patterns supported ("billing.py::*")
created_atdatetime (UTC)Reservation creation time
expires_atdatetime (UTC)Hard TTL expiry; extended by heartbeat
operationstr | null"modify" "rename" "delete" "extract" "move" — used by Tier 3 forecast
The reservation ID is content-addressed — calling muse coord reserve twice with the same arguments returns the same ID and writes a single file. Idempotent.
# reserve one symbol
muse coord reserve "src/billing.py::compute_total" \
  --run-id agent-1 --op modify --ttl 7200 --json

# reserve an entire file with a glob
muse coord reserve "src/billing.py::*" \
  --run-id agent-1 --op rename --json

# release when done
muse coord release <reservation_id> --run-id agent-1 --json

# release everything for a run (end of pipeline)
muse coord release --all-for-run agent-1 --run-id agent-1 --json

Active-reservation check

A reservation is active when all of the following hold:

  1. No release tombstone exists for reservation_id
  2. now < max(expires_at, heartbeat.extended_expires_at)

Once a release tombstone is written it is checked before full JSON parsing — the stem-only scan makes active-check O(n) in file count, not file size.

Heartbeat — TTL extension

Long-running agents extend their reservations and claim TTLs with heartbeats. Each heartbeat atomically rewrites a single file with a new extended_expires_at.

FieldDescription
reservation_idWhich reservation this keep-alive extends
run_idMust match the reservation's run_id
last_beat_atThis heartbeat's timestamp
extended_expires_atnow + extension_seconds — new effective expiry
# extend reservation TTL
muse coord heartbeat <reservation_id> --run-id agent-1 --json

# extend task claim TTL
muse coord heartbeat <task_id> --run-id agent-1 --json

# recommended poll interval: extension_seconds / 2
# default extension: 3600s → heartbeat every ~1800s

Reservation TTL defaults: 3600s. Claim TTL defaults: 3600s. Task pending TTL defaults: 86400s (24h). Extension range: 1s – 31,536,000s (1 year).

Dependency DAG

Reservations can declare dependencies on other reservations — "I cannot start until agent-B's rename is complete." The DAG enforces ordering without a central scheduler. Acyclicity is checked at write time; cycles are structurally impossible on disk.

Three-step dependency chain

A rename must happen before modify, which must happen before the test update. Each reservation declares its predecessor with --depends-on:

bash build a rename → modify → test chain
# Step 1 — agent-A renames the symbol (no dependency)
RENAME_ID=$(muse coord reserve "src/billing.py::compute_total" \
  --run-id agent-A --op rename --json \
  | python3 -c "import sys,json; print(json.load(sys.stdin)['reservation_id'])")

# Step 2 — agent-B modifies the renamed symbol (waits for step 1)
MODIFY_ID=$(muse coord reserve "src/billing.py::compute_invoice_total" \
  --run-id agent-B --op modify --depends-on $RENAME_ID --json \
  | python3 -c "import sys,json; print(json.load(sys.stdin)['reservation_id'])")

# Step 3 — agent-C updates tests (waits for step 2)
muse coord reserve "tests/test_billing.py::test_total" \
  --run-id agent-C --op modify --depends-on $MODIFY_ID --json

Check whether agent-B is blocked before it starts work:

bash inspect the DAG
muse coord dag --json
json DAG output — topological order and blocked nodes
{
  "nodes": [
    { "reservation_id": "a1b2c3...", "run_id": "agent-A",
      "addresses": ["src/billing.py::compute_total"],
      "operation": "rename", "status": "active",   "depends_on": [] },
    { "reservation_id": "d4e5f6...", "run_id": "agent-B",
      "addresses": ["src/billing.py::compute_invoice_total"],
      "operation": "modify", "status": "blocked", "depends_on": ["a1b2c3..."] },
    { "reservation_id": "g7h8i9...", "run_id": "agent-C",
      "addresses": ["tests/test_billing.py::test_total"],
      "operation": "modify", "status": "blocked", "depends_on": ["d4e5f6..."] }
  ],
  "edges": [
    { "from": "a1b2c3...", "to": "d4e5f6...", "reason": "depends_on" },
    { "from": "d4e5f6...", "to": "g7h8i9...", "reason": "depends_on" }
  ],
  "topological_order": ["a1b2c3...", "d4e5f6...", "g7h8i9..."],
  "cycles": [],
  "active_count":  1,
  "blocked_count": 2
}
text visual (topological order — safe execution sequence)
agent-A  rename  compute_total           [ACTIVE]
  └─[depends_on]──▶ agent-B  modify  compute_invoice_total  [BLOCKED]
                      └─[depends_on]──▶ agent-C  modify  test_total  [BLOCKED]
bash inspect and release commands
# inspect the full DAG
muse coord dag --json

# show only active (unblocked) nodes
muse coord dag --active-only --json

DependencyRecord fields

FieldDescription
reservation_idThe dependent — this reservation waits
depends_onList of reservation IDs it waits for (max 256)
created_atUTC timestamp

is_blocked(reservation_id) returns true if any dependency is still active. get_blocking(reservation_id) returns the list of active dependencies. DAG output includes topological order (dependencies before dependents — safe execution sequence) and detects cycles if one somehow exists.

Sharding

muse coord shard partitions the import graph into parallel-safe work zones. Agents assigned to different shards never touch the same files, eliminating merge conflicts by construction.

Algorithm

  1. Build the import graph from the HEAD snapshot: file → file edges from import pseudo-symbols.
  2. Compute weakly-connected components via DFS.
  3. Apply LPT (Longest Processing Time) heuristic: greedily assign the largest component to the shard with the fewest symbols so far, minimizing makespan.
  4. Count cross_shard_edges: imports that cross shard boundaries (lower = better isolation).
muse coord shard --agents 4 --json
muse coord shard --agents 8 --language Python --json
muse coord shard --agents 4 --commit HEAD~10 --json

ShardPlan output

{
  "commit": "f3c14a89",            // 8-char display ID
  "full_commit_id": "sha256:f3c14a89cd37…",
  "agents": 4,
  "shards_created": 4,           // may be < agents if repo has fewer components
  "total_files": 87,
  "total_symbols": 1204,
  "cross_shard_edges": 3,         // lower = better isolation
  "shards": [
    {
      "shard": 1,
      "files": ["src/billing.py", "src/models.py"],
      "symbol_count": 312,
      "coupling_score": 1           // edges from this shard to others
    }
  ]
}
If the repo has fewer weakly-connected components than --agents, shards_created < agents. The extra agents have nothing to claim and should idle or handle a different queue.

Conflict forecasting

muse coord forecast predicts merge conflicts before any agent writes code. It runs three independent passes over active reservations and intents, each with a calibrated confidence score.

PassConflict typeConfidenceCondition
1 "address_overlap" 1.0 Same symbol address reserved by two or more different run_ids
2 "blast_radius_overlap" 0.75 One reserved address is a transitive caller of another (requires muse code index); skipped if call graph unavailable
3 "operation_conflict" 0.9 Intents on the same address combine incompatible operations: delete vs modify, rename, or extract
muse coord forecast --json
muse coord forecast --branch feat/auth --json
muse coord forecast --min-confidence 0.9 --json   # high-risk only

Output

{
  "active_reservations": 12,
  "call_graph_available": true,
  "partial_forecast": false,
  "conflicts": [
    {
      "conflict_type": "address_overlap",
      "addresses": ["src/billing.py::compute_total"],
      "agents": ["agent-1@feat/auth", "agent-2@feat/payments"],
      "confidence": 1.0,
      "description": "Symbol reserved by 2 agents on different branches"
    }
  ],
  "high_risk": 1,    // confidence >= 0.9
  "medium_risk": 0,  // 0.5 <= confidence < 0.9
  "low_risk": 0      // confidence < 0.5
}

Agents are shown as "run_id@branch". Pass 2 is silently skipped when the code intelligence index is unavailable, setting partial_forecast: true.

Watch — real-time event stream

muse coord watch streams coordination events as NDJSON (one JSON object per line). On macOS/BSD it uses kqueue (zero CPU between events); on Linux it falls back to polling at a configurable interval (default 1.0s).

muse coord watch --json
muse coord watch --once --json              # exit after first event
muse coord watch --kind reservation --json  # filter by kind
muse coord watch --run-id agent-1 --json   # filter by agent

Event types and fields

FieldValuesDescription
event_type"snapshot" "added" "modified" "removed" "expired"What happened to the record
kind"reservation" "intent" "release" "heartbeat"Type of coordination record
idstrID of the changed record
timestampdatetime (UTC)When the event occurred
datadictParsed record content; empty dict for removed records with no cache

"snapshot" events fire on startup for each existing record — useful for replaying current state before watching for changes.

Remote sync

Coordination state can be pushed to and pulled from MuseHub, enabling multi-machine agent pipelines. Remote records are stored read-only under .muse/coordination/remote/.

# push local coordination records to MuseHub (batches of 500)
muse coord sync push \
  --hub http://staging.musehub.ai \
  --owner gabriel --slug muse --json

# pull remote records (paginated, use --since-id for incremental)
muse coord sync pull \
  --hub http://staging.musehub.ai \
  --owner gabriel --slug muse \
  --since-id 1024 --limit 500 --json

Push is idempotent: the hub responds with {"inserted": N, "skipped": M}. Pull returns a cursor to use as --since-id in the next call. Supported kinds: reservation, intent, release, heartbeat, task, claim, dependency.

Storage layout

All coordination state is file-backed under .muse/coordination/. Every file is JSON. IDs are content-addressed sha256 hex strings used directly as filenames.

.muse/coordination/
  reservations/<sha256-hex>.json     ← write-once advisory symbol lease
  intents/<sha256-hex>.json          ← write-once operation declaration (no TTL)
  releases/<sha256-hex>.json         ← write-once tombstone (completed/cancelled)
  heartbeats/<sha256-hex>.json       ← mutable keep-alive; atomically overwritten
  tasks/<sha256-hex>.json            ← immutable task definition
  claims/<sha256-hex>.json           ← mutable claim state (O_CREAT|O_EXCL then atomic)
  dependencies/<sha256-hex>.json     ← write-once DAG edges
  remote/                            ← records synced from MuseHub (read-only)

GC policy

muse coord gc collects stale records after a configurable grace period (default 300s = 5 min, protecting against agents mid-read). Run dry-run first:

muse coord gc --json           # dry-run, shows what would be collected
muse coord gc --execute --json  # actually delete
CollectedCondition
Expired reservationsTTL exhausted (including heartbeat extensions), past grace period
Released reservationsRelease tombstone present, older than grace period
Orphaned releasesTombstone exists but reservation file gone
Orphaned heartbeatsHeartbeat exists but reservation file gone
Old intents (opt-in)--include-intents; older than --max-intent-age (default 7 days)

CLI reference

All commands accept --json.

TaskCommand
Enqueue a taskmuse coord enqueue "title" --run-id orch --json
Claim next pending taskmuse coord claim --run-id agent-1 --json
Claim from specific queuemuse coord claim --queue refactor --run-id agent-1 --json
Complete a taskmuse coord complete <task_id> --run-id agent-1 --json
Fail a taskmuse coord fail-task <task_id> --run-id agent-1 --error "msg" --json
Cancel a taskmuse coord cancel-task <task_id> --run-id orch --json
List tasksmuse coord tasks --status pending --json
Reserve symbol addressesmuse coord reserve "billing.py::Fn" --run-id agent-1 --op modify --json
Reserve with TTLmuse coord reserve "billing.py::Fn" --run-id agent-1 --ttl 7200 --json
Reserve with dependencymuse coord reserve "billing.py::Fn" --run-id agent-B --depends-on <id> --json
Release a reservationmuse coord release <id> --run-id agent-1 --json
Release all for a runmuse coord release --all-for-run agent-1 --run-id agent-1 --json
Heartbeat (reservation)muse coord heartbeat <reservation_id> --run-id agent-1 --json
Heartbeat (claim)muse coord heartbeat <task_id> --run-id agent-1 --json
List active reservationsmuse coord list --json
Inspect DAGmuse coord dag --json
Shard codebasemuse coord shard --agents 4 --json
Forecast conflictsmuse coord forecast --json
Reconcile merge ordermuse coord reconcile --json
Watch eventsmuse coord watch --json
Push to MuseHubmuse coord sync push --hub URL --owner O --slug S --json
Pull from MuseHubmuse coord sync pull --hub URL --owner O --slug S --json
GC stale records (dry-run)muse coord gc --json
GC (execute)muse coord gc --execute --json

Default TTLs

EntityDefault TTL
Reservation3600s (1h)
Task claim3600s (1h)
Task pending86400s (24h)
Heartbeat extension3600s
IntentNo TTL — permanent until GC with --include-intents
GC grace period300s (5 min)