Agent Coordination — Muse Developer Docs

Overview

Two independent primitives compose the system:

Primitive	Purpose	Atomicity
Task queue	Distribute discrete units of work across agents. Producer enqueues; consumers race to claim; winner completes or fails.	`open(O_CREAT\|O_EXCL)` — POSIX-guaranteed single winner
Reservations	Advisory symbol-address leases. Agents declare intent before touching a symbol; `muse coord forecast` detects overlaps before they happen.	`write_text_atomic()` (atomic rename); content-addressed ID prevents duplicates

A typical multi-agent pipeline:

# Orchestrator shards the repo into N parallel zones
muse coord shard --agents 4 --json

# Orchestrator enqueues one task per shard
muse coord enqueue "Refactor shard 1" --queue refactor \
  --payload '{"files":["src/billing.py","src/models.py"]}' \
  --run-id orchestrator --json

# Each worker claims one task
muse coord claim --queue refactor --run-id agent-1 --json

# Worker reserves the symbols it will touch
muse coord reserve "src/billing.py::compute_total" \
  --run-id agent-1 --op modify --ttl 7200 --json

# Forecast before any writes
muse coord forecast --json

# Work … commit … then release and complete
muse coord release <reservation_id> --run-id agent-1 --json
muse coord complete <task_id> --run-id agent-1 --json

Multi-agent scenario — orchestrator + 3 workers

This scenario shows a complete pipeline: one orchestrator shards the repo, enqueues three tasks, three workers race to claim them, a forecast catches a symbol overlap before any code changes, and the orchestrator collects all results.

Step 1 — shard and enqueue

bash orchestrator — shard into 3 zones, enqueue 3 tasks

# 1. Split the import graph into 3 parallel-safe zones
muse coord shard --agents 3 --json \
  | python3 -c "import sys,json; [print(s['shard'], s['files']) for s in json.load(sys.stdin)['shards']]"
# 1 ['src/billing.py', 'src/models.py']
# 2 ['src/auth.py', 'src/tokens.py']
# 3 ['src/api.py', 'src/middleware.py']

# 2. Enqueue one task per shard
muse coord enqueue "Refactor shard 1" --queue refactor --priority 10 \
  --payload '{"files":["src/billing.py","src/models.py"]}' \
  --run-id orchestrator --json
muse coord enqueue "Refactor shard 2" --queue refactor --priority 10 \
  --payload '{"files":["src/auth.py","src/tokens.py"]}' \
  --run-id orchestrator --json
muse coord enqueue "Refactor shard 3" --queue refactor --priority 10 \
  --payload '{"files":["src/api.py","src/middleware.py"]}' \
  --run-id orchestrator --json

Step 2 — workers race to claim

bash three workers run this concurrently — each gets a different task

# Each worker claims from the refactor queue (POSIX-atomic, one winner per task)
TASK=$(muse coord claim --queue refactor --run-id agent-1 --json)
TASK_ID=$(echo "$TASK" | python3 -c "import sys,json; print(json.load(sys.stdin)['task_id'])")
FILES=$(echo "$TASK" | python3 -c "import sys,json; print(json.load(sys.stdin)['payload']['files'])")

json agent-1 claim response

{
  "task_id":          "sha256:4e8b2f1a3c7d...",
  "title":            "Refactor shard 1",
  "queue":            "refactor",
  "claimer_run_id":   "agent-1",
  "claimed_at":       "2026-04-21T16:00:00Z",
  "expires_at":       "2026-04-21T17:00:00Z",
  "payload":          { "files": ["src/billing.py", "src/models.py"] }
}

Step 3 — reserve symbols, run forecast

bash each worker reserves before touching — agent-1 and agent-2 both reserve compute_total

# agent-1: billing shard
muse coord reserve "src/billing.py::compute_total" \
  --run-id agent-1 --op modify --ttl 7200 --json

# agent-2: auth shard — also touches compute_total (import chain)
muse coord reserve "src/billing.py::compute_total" \
  --run-id agent-2 --op rename --ttl 7200 --json

# orchestrator: forecast before any agent writes code
muse coord forecast --json

json forecast — overlap detected

{
  "active_reservations": 3,
  "call_graph_available": true,
  "partial_forecast":    false,
  "conflicts": [
    {
      "conflict_type": "address_overlap",
      "addresses":    ["src/billing.py::compute_total"],
      "agents":       ["agent-1@feat/refactor", "agent-2@feat/auth"],
      "confidence":   1.0,
      "description": "Symbol reserved by 2 agents on different branches"
    },
    {
      "conflict_type": "operation_conflict",
      "addresses":    ["src/billing.py::compute_total"],
      "agents":       ["agent-1@feat/refactor", "agent-2@feat/auth"],
      "confidence":   0.9,
      "description": "Incompatible operations: modify vs rename on same symbol"
    }
  ],
  "high_risk":   2,
  "medium_risk": 0,
  "low_risk":    0
}

Orchestrator sees the conflict and serializes the work: agent-1 renames first, agent-2's reservation is updated to depends-on agent-1's:

bash orchestrator resolves the overlap — serialize via depends-on

# Agent-2 releases its reservation and re-reserves with depends-on
muse coord release <agent-2-reservation-id> --run-id agent-2 --json
muse coord reserve "src/billing.py::compute_total" \
  --run-id agent-2 --op modify --ttl 7200 \
  --depends-on <agent-1-reservation-id> --json

Step 4 — work, commit, release, complete

bash each worker — do work, commit, then release and complete

# ... agent does work on its branch ...
muse commit -m "refactor: shard 1 billing module" \
  --agent-id claude-code --model-id claude-sonnet-4-6 --sign

# Release all reservations for this run
muse coord release --all-for-run agent-1 --run-id agent-1 --json

# Complete the task with a structured result
muse coord complete sha256:4e8b2f1a3c7d... \
  --run-id agent-1 \
  --result '{"symbols_modified": 12, "tests_passing": true}' \
  --json

Step 5 — orchestrator collects results

bash orchestrator — wait for all tasks to complete

muse coord tasks --queue refactor --status completed --json \
  | python3 -c "
import sys, json
data = json.load(sys.stdin)
for t in data['tasks']:
    print(t['title'], '→', t['result'])
"

text output

Refactor shard 1 → {'symbols_modified': 12, 'tests_passing': True}
Refactor shard 2 → {'symbols_modified': 8,  'tests_passing': True}
Refactor shard 3 → {'symbols_modified': 5,  'tests_passing': True}

Work queues

Tasks are content-addressed by their inputs — same title + queue + payload + priority + creator always produces the same task_id. Queues are named strings; any alphanumeric-plus-hyphen-or-underscore name up to 64 chars is valid.

TaskRecord fields

Field	Type	Description
`task_id`	str (sha256:…)	Content-addressed; same inputs → same ID (idempotent enqueue)
`title`	str ≤256	Short human-readable description of the work unit
`payload`	dict	Arbitrary JSON — agent-specific data passed to the claimer
`priority`	int	Higher = claimed first; ties broken by `created_at` FIFO
`queue`	str ≤64	Queue name `[a-zA-Z0-9_-]+`; default `"default"`
`created_at`	datetime (UTC)	Enqueue timestamp
`created_by`	str ≤256	`run_id` of the enqueuing agent
`ttl_seconds`	int	Seconds from `created_at` before pending task expires; default 86400 (24h)
`tags`	list[str]	Up to 32 tags, each ≤64 chars; for filtering and grouping

muse coord enqueue "Lint billing module" \
  --queue lint \
  --priority 10 \
  --payload '{"file": "src/billing.py"}' \
  --ttl 3600 \
  --tags billing python \
  --run-id orch --json

Claim lifecycle

Task state is derived from two files: the task record (immutable) and the claim record (mutable). The claim file is created with open(O_CREAT|O_EXCL) — POSIX guarantees exactly one agent wins the race; all others get FileExistsError.

pending ──[O_CREAT|O_EXCL, one winner]──→ claimed
                                               │
                   ┌───────────────────────────┼────────────────┐
                   ↓                           ↓                ↓
             completed                      failed          cancelled
            (result set)               (error set)

           ↑ Re-claim path (timed_out):
           claimed ──[expires_at ≤ now]──→ timed_out ──[write_atomic + nonce]──→ claimed

ClaimRecord fields

Field	Type	Description
`task_id`	str (sha256:…)	Links to the immutable TaskRecord
`claimer_run_id`	str ≤256	Agent that claimed the task
`claimed_at`	datetime (UTC)	Initial claim timestamp
`expires_at`	datetime (UTC)	Claim TTL; default 3600s; extended by heartbeat
`status`	str	`"claimed"` `"completed"` `"failed"` `"cancelled"` `"timed_out"`
`heartbeat_at`	datetime (UTC)	Last heartbeat timestamp
`claim_nonce`	str	Optimistic concurrency token for re-claim race resolution
`result`	dict \| null	Set by `muse coord complete --result JSON`
`error`	str \| null	Set by `muse coord fail-task --error MSG`

Claiming and re-claiming

muse coord claim scans the queue sorted by priority descending, then created_at ascending (FIFO within a priority band). It tries each candidate in order:

Pending task — open(O_CREAT|O_EXCL). POSIX-atomic. First agent to call wins; rest loop to the next candidate.
Timed-out task — write_text_atomic with a fresh claim_nonce, then read back. If the nonce matches, the agent won the re-claim race. If not, another agent beat it; loop to the next candidate.

muse coord claim --queue refactor --run-id agent-1 --json

muse coord complete <task_id> --run-id agent-1 \
  --result '{"symbols_renamed": 7}' --json

muse coord fail-task <task_id> --run-id agent-1 \
  --error "AST parse failed on line 42" --json

muse coord cancel-task <task_id> --run-id orch --json

muse coord tasks --queue refactor --status pending --json

Only the claimer_run_id may call complete, fail-task, or heartbeat on a claim. Orchestrators use cancel-task.

Reservations

A reservation is an advisory symbol-address lease. It does not prevent writes (no lock server required) — it signals intent so muse coord forecast can detect overlaps before they produce merge conflicts.

ReservationRecord fields

Field	Type	Description
`reservation_id`	str (sha256:…)	Content-addressed: `SHA-256(run_id, branch, sorted(addresses), operation)`
`run_id`	str ≤256	Agent identifier — use your pipeline ID
`branch`	str	Branch this agent is working on
`addresses`	list[str]	Symbol addresses; `fnmatch` glob patterns supported (`"billing.py::*"`)
`created_at`	datetime (UTC)	Reservation creation time
`expires_at`	datetime (UTC)	Hard TTL expiry; extended by heartbeat
`operation`	str \| null	`"modify"` `"rename"` `"delete"` `"extract"` `"move"` — used by Tier 3 forecast

The reservation ID is content-addressed — calling muse coord reserve twice with the same arguments returns the same ID and writes a single file. Idempotent.

# reserve one symbol
muse coord reserve "src/billing.py::compute_total" \
  --run-id agent-1 --op modify --ttl 7200 --json

# reserve an entire file with a glob
muse coord reserve "src/billing.py::*" \
  --run-id agent-1 --op rename --json

# release when done
muse coord release <reservation_id> --run-id agent-1 --json

# release everything for a run (end of pipeline)
muse coord release --all-for-run agent-1 --run-id agent-1 --json

Active-reservation check

A reservation is active when all of the following hold:

No release tombstone exists for reservation_id
now < max(expires_at, heartbeat.extended_expires_at)

Once a release tombstone is written it is checked before full JSON parsing — the stem-only scan makes active-check O(n) in file count, not file size.

Heartbeat — TTL extension

Long-running agents extend their reservations and claim TTLs with heartbeats. Each heartbeat atomically rewrites a single file with a new extended_expires_at.

Field	Description
`reservation_id`	Which reservation this keep-alive extends
`run_id`	Must match the reservation's `run_id`
`last_beat_at`	This heartbeat's timestamp
`extended_expires_at`	`now + extension_seconds` — new effective expiry

# extend reservation TTL
muse coord heartbeat <reservation_id> --run-id agent-1 --json

# extend task claim TTL
muse coord heartbeat <task_id> --run-id agent-1 --json

# recommended poll interval: extension_seconds / 2
# default extension: 3600s → heartbeat every ~1800s

Reservation TTL defaults: 3600s. Claim TTL defaults: 3600s. Task pending TTL defaults: 86400s (24h). Extension range: 1s – 31,536,000s (1 year).

Dependency DAG

Reservations can declare dependencies on other reservations — "I cannot start until agent-B's rename is complete." The DAG enforces ordering without a central scheduler. Acyclicity is checked at write time; cycles are structurally impossible on disk.

Three-step dependency chain

A rename must happen before modify, which must happen before the test update. Each reservation declares its predecessor with --depends-on:

bash build a rename → modify → test chain

# Step 1 — agent-A renames the symbol (no dependency)
RENAME_ID=$(muse coord reserve "src/billing.py::compute_total" \
  --run-id agent-A --op rename --json \
  | python3 -c "import sys,json; print(json.load(sys.stdin)['reservation_id'])")

# Step 2 — agent-B modifies the renamed symbol (waits for step 1)
MODIFY_ID=$(muse coord reserve "src/billing.py::compute_invoice_total" \
  --run-id agent-B --op modify --depends-on $RENAME_ID --json \
  | python3 -c "import sys,json; print(json.load(sys.stdin)['reservation_id'])")

# Step 3 — agent-C updates tests (waits for step 2)
muse coord reserve "tests/test_billing.py::test_total" \
  --run-id agent-C --op modify --depends-on $MODIFY_ID --json

Check whether agent-B is blocked before it starts work:

bash inspect the DAG

muse coord dag --json

json DAG output — topological order and blocked nodes

{
  "nodes": [
    { "reservation_id": "a1b2c3...", "run_id": "agent-A",
      "addresses": ["src/billing.py::compute_total"],
      "operation": "rename", "status": "active",   "depends_on": [] },
    { "reservation_id": "d4e5f6...", "run_id": "agent-B",
      "addresses": ["src/billing.py::compute_invoice_total"],
      "operation": "modify", "status": "blocked", "depends_on": ["a1b2c3..."] },
    { "reservation_id": "g7h8i9...", "run_id": "agent-C",
      "addresses": ["tests/test_billing.py::test_total"],
      "operation": "modify", "status": "blocked", "depends_on": ["d4e5f6..."] }
  ],
  "edges": [
    { "from": "a1b2c3...", "to": "d4e5f6...", "reason": "depends_on" },
    { "from": "d4e5f6...", "to": "g7h8i9...", "reason": "depends_on" }
  ],
  "topological_order": ["a1b2c3...", "d4e5f6...", "g7h8i9..."],
  "cycles": [],
  "active_count":  1,
  "blocked_count": 2
}

text visual (topological order — safe execution sequence)

agent-A  rename  compute_total           [ACTIVE]
  └─[depends_on]──▶ agent-B  modify  compute_invoice_total  [BLOCKED]
                      └─[depends_on]──▶ agent-C  modify  test_total  [BLOCKED]

bash inspect and release commands

# inspect the full DAG
muse coord dag --json

# show only active (unblocked) nodes
muse coord dag --active-only --json

DependencyRecord fields

Field	Description
`reservation_id`	The dependent — this reservation waits
`depends_on`	List of reservation IDs it waits for (max 256)
`created_at`	UTC timestamp

is_blocked(reservation_id) returns true if any dependency is still active. get_blocking(reservation_id) returns the list of active dependencies. DAG output includes topological order (dependencies before dependents — safe execution sequence) and detects cycles if one somehow exists.

Sharding

muse coord shard partitions the import graph into parallel-safe work zones. Agents assigned to different shards never touch the same files, eliminating merge conflicts by construction.

Algorithm

Build the import graph from the HEAD snapshot: file → file edges from import pseudo-symbols.
Compute weakly-connected components via DFS.
Apply LPT (Longest Processing Time) heuristic: greedily assign the largest component to the shard with the fewest symbols so far, minimizing makespan.
Count cross_shard_edges: imports that cross shard boundaries (lower = better isolation).

muse coord shard --agents 4 --json
muse coord shard --agents 8 --language Python --json
muse coord shard --agents 4 --commit HEAD~10 --json

ShardPlan output

{
  "commit": "f3c14a89",            // 8-char display ID
  "full_commit_id": "sha256:f3c14a89cd37…",
  "agents": 4,
  "shards_created": 4,           // may be < agents if repo has fewer components
  "total_files": 87,
  "total_symbols": 1204,
  "cross_shard_edges": 3,         // lower = better isolation
  "shards": [
    {
      "shard": 1,
      "files": ["src/billing.py", "src/models.py"],
      "symbol_count": 312,
      "coupling_score": 1           // edges from this shard to others
    }
  ]
}

If the repo has fewer weakly-connected components than --agents, shards_created < agents. The extra agents have nothing to claim and should idle or handle a different queue.

Conflict forecasting

muse coord forecast predicts merge conflicts before any agent writes code. It runs three independent passes over active reservations and intents, each with a calibrated confidence score.

Pass	Conflict type	Confidence	Condition
1	`"address_overlap"`	1.0	Same symbol address reserved by two or more different `run_id`s
2	`"blast_radius_overlap"`	0.75	One reserved address is a transitive caller of another (requires `muse code index`); skipped if call graph unavailable
3	`"operation_conflict"`	0.9	Intents on the same address combine incompatible operations: `delete` vs `modify`, `rename`, or `extract`

muse coord forecast --json
muse coord forecast --branch feat/auth --json
muse coord forecast --min-confidence 0.9 --json   # high-risk only

Output

{
  "active_reservations": 12,
  "call_graph_available": true,
  "partial_forecast": false,
  "conflicts": [
    {
      "conflict_type": "address_overlap",
      "addresses": ["src/billing.py::compute_total"],
      "agents": ["agent-1@feat/auth", "agent-2@feat/payments"],
      "confidence": 1.0,
      "description": "Symbol reserved by 2 agents on different branches"
    }
  ],
  "high_risk": 1,    // confidence >= 0.9
  "medium_risk": 0,  // 0.5 <= confidence < 0.9
  "low_risk": 0      // confidence < 0.5
}

Agents are shown as "run_id@branch". Pass 2 is silently skipped when the code intelligence index is unavailable, setting partial_forecast: true.

Watch — real-time event stream

muse coord watch streams coordination events as NDJSON (one JSON object per line). On macOS/BSD it uses kqueue (zero CPU between events); on Linux it falls back to polling at a configurable interval (default 1.0s).

muse coord watch --json
muse coord watch --once --json              # exit after first event
muse coord watch --kind reservation --json  # filter by kind
muse coord watch --run-id agent-1 --json   # filter by agent

Event types and fields

Field	Values	Description
`event_type`	`"snapshot"` `"added"` `"modified"` `"removed"` `"expired"`	What happened to the record
`kind`	`"reservation"` `"intent"` `"release"` `"heartbeat"`	Type of coordination record
`id`	str	ID of the changed record
`timestamp`	datetime (UTC)	When the event occurred
`data`	dict	Parsed record content; empty dict for removed records with no cache

"snapshot" events fire on startup for each existing record — useful for replaying current state before watching for changes.

Remote sync

Coordination state can be pushed to and pulled from MuseHub, enabling multi-machine agent pipelines. Remote records are stored read-only under .muse/coordination/remote/.

# push local coordination records to MuseHub (batches of 500)
muse coord sync push \
  --hub http://staging.musehub.ai \
  --owner gabriel --slug muse --json

# pull remote records (paginated, use --since-id for incremental)
muse coord sync pull \
  --hub http://staging.musehub.ai \
  --owner gabriel --slug muse \
  --since-id 1024 --limit 500 --json

Push is idempotent: the hub responds with {"inserted": N, "skipped": M}. Pull returns a cursor to use as --since-id in the next call. Supported kinds: reservation, intent, release, heartbeat, task, claim, dependency.

Storage layout

All coordination state is file-backed under .muse/coordination/. Every file is JSON. IDs are content-addressed sha256 hex strings used directly as filenames.

.muse/coordination/
  reservations/<sha256-hex>.json     ← write-once advisory symbol lease
  intents/<sha256-hex>.json          ← write-once operation declaration (no TTL)
  releases/<sha256-hex>.json         ← write-once tombstone (completed/cancelled)
  heartbeats/<sha256-hex>.json       ← mutable keep-alive; atomically overwritten
  tasks/<sha256-hex>.json            ← immutable task definition
  claims/<sha256-hex>.json           ← mutable claim state (O_CREAT|O_EXCL then atomic)
  dependencies/<sha256-hex>.json     ← write-once DAG edges
  remote/                            ← records synced from MuseHub (read-only)

GC policy

muse coord gc collects stale records after a configurable grace period (default 300s = 5 min, protecting against agents mid-read). Run dry-run first:

muse coord gc --json           # dry-run, shows what would be collected
muse coord gc --execute --json  # actually delete

Collected	Condition
Expired reservations	TTL exhausted (including heartbeat extensions), past grace period
Released reservations	Release tombstone present, older than grace period
Orphaned releases	Tombstone exists but reservation file gone
Orphaned heartbeats	Heartbeat exists but reservation file gone
Old intents (opt-in)	`--include-intents`; older than `--max-intent-age` (default 7 days)

CLI reference

All commands accept --json.

Task	Command
Enqueue a task	`muse coord enqueue "title" --run-id orch --json`
Claim next pending task	`muse coord claim --run-id agent-1 --json`
Claim from specific queue	`muse coord claim --queue refactor --run-id agent-1 --json`
Complete a task	`muse coord complete <task_id> --run-id agent-1 --json`
Fail a task	`muse coord fail-task <task_id> --run-id agent-1 --error "msg" --json`
Cancel a task	`muse coord cancel-task <task_id> --run-id orch --json`
List tasks	`muse coord tasks --status pending --json`
Reserve symbol addresses	`muse coord reserve "billing.py::Fn" --run-id agent-1 --op modify --json`
Reserve with TTL	`muse coord reserve "billing.py::Fn" --run-id agent-1 --ttl 7200 --json`
Reserve with dependency	`muse coord reserve "billing.py::Fn" --run-id agent-B --depends-on <id> --json`
Release a reservation	`muse coord release <id> --run-id agent-1 --json`
Release all for a run	`muse coord release --all-for-run agent-1 --run-id agent-1 --json`
Heartbeat (reservation)	`muse coord heartbeat <reservation_id> --run-id agent-1 --json`
Heartbeat (claim)	`muse coord heartbeat <task_id> --run-id agent-1 --json`
List active reservations	`muse coord list --json`
Inspect DAG	`muse coord dag --json`
Shard codebase	`muse coord shard --agents 4 --json`
Forecast conflicts	`muse coord forecast --json`
Reconcile merge order	`muse coord reconcile --json`
Watch events	`muse coord watch --json`
Push to MuseHub	`muse coord sync push --hub URL --owner O --slug S --json`
Pull from MuseHub	`muse coord sync pull --hub URL --owner O --slug S --json`
GC stale records (dry-run)	`muse coord gc --json`
GC (execute)	`muse coord gc --execute --json`

Default TTLs

Entity	Default TTL
Reservation	3600s (1h)
Task claim	3600s (1h)
Task pending	86400s (24h)
Heartbeat extension	3600s
Intent	No TTL — permanent until GC with `--include-intents`
GC grace period	300s (5 min)