gabriel / musehub public
Closed #59 wire-protocol
filed by gabriel human · 18 days ago

Wire fetch pseudocode — client and server (first principles)

0 Anchors
Blast radius
Churn 30d
0 Proposals

Overview

`fetch` downloads commits, snapshots, and blobs from a remote branch into the local repo without touching the working tree or advancing the local branch pointer. It is the shared primitive that both `clone` and `pull` call into.


Client

``` fetch <remote> <branch>:

  1. discover remote state GET /refs (no filter — returns all branch heads) → { branch_heads: { <name>: <commit_id>, ... } } remote_tip = branch_heads[<branch>] or null if remote_tip is null: exit 0 (branch doesn't exist on remote — nothing to fetch)

  2. discover local state have = all commit_ids reachable from ALL local branch heads (what we already have locally — dedup anchors) local_tip = local branch head for <branch> or null

    if local_tip == remote_tip: exit 0 (already up-to-date)

  3. compute want set want = { remote_tip } (MVP: single branch tip) missing = want - have (commits we don't have yet) if missing is empty: exit 0 (we already have everything the remote has)

  4. POST /fetch/plan { want: [remote_tip], have: [...have] } → { commits_to_send: [commit_id, ...], # topo sorted, parents first snapshots_to_send: [snapshot_id, ...], blobs_to_send: [blob_id, ...], mpack_key: "sha256:...", mpack_size_bytes: N, } Server computes the minimal set of objects the client is missing. Client trusts the list but re-verifies everything on receipt.

  5. GET /fetch/mpack?mpack_key=<mpack_key> → presigned GET URL for the mpack binary in object store (URL may be a direct S3/MinIO presigned URL)

  6. GET <presigned_url> download mpack_bytes

  7. verify integrity before any writes: actual_key = blob_id(mpack_bytes) # sha256("blob <size>\0" + bytes) if actual_key != mpack_key: abort: "mpack integrity check failed — fetch corrupt or tampered"

  8. parse mpack → { commits, snapshots, blobs } verify b"MUSE" magic → abort if not MUSE binary format

  9. write blobs to local object store: for each blob in mpack["blobs"]: actual_id = blob_id(blob["content"]) if actual_id != blob["object_id"]: warn "skipped corrupt blob <blob.object_id>" skipped_blobs += 1 continue if not has_object(local_root, blob["object_id"]): write_object(local_root, blob["object_id"], blob["content"]) blobs_written += 1

  10. write snapshots to local snapshot store: for each snapshot in mpack["snapshots"] (topo order, parents first): if has_snapshot(local_root, snapshot["snapshot_id"]): continue # idempotent write_snapshot(local_root, SnapshotRecord(**snapshot)) snapshots_written += 1

  11. write commits to local commit store: for each commit in mpack["commits"] (topo order, parents first): if has_commit(local_root, commit["commit_id"]): continue # idempotent write_commit(local_root, CommitRecord(**commit)) commits_written += 1

  12. advance FETCH_HEAD (not the branch pointer): write local_root/.muse/FETCH_HEAD = remote_tip NOTE: fetch never touches refs/heads/<branch> that is pull's job (fetch + merge)

  13. return { commits_written, snapshots_written, blobs_written, skipped_blobs, remote_tip, already_up_to_date: False } ```


Server

POST /fetch/plan

``` receive { want: [commit_id, ...], have: [commit_id, ...] }:

  1. validate auth (MSign header)

  2. resolve want set: for each want_id: if not exists in commits table → 404 "commit not found: <id>"

  3. walk commit DAG from each want_id: commits_to_send = BFS/DFS from want set, stopping at any id in have set (topo sorted: parents before children) NOTE: "have" set may be empty (clone case)

  4. collect snapshots: snapshots_to_send = { c.snapshot_id for c in commits_to_send } minus any snapshot_id already in have's snapshot closure topo sort (parents before children)

  5. collect blobs: blobs_to_send = union of all object_ids referenced in each snapshot_to_send's manifest minus any blob_id reachable from have set NOTE: compute manifest by applying deltas from root or loading stored manifest blob

  6. build mpack binary: mpack = build_wire_mpack({ commits: [commit_record for c in commits_to_send], snapshots: [snapshot_record for s in snapshots_to_send], blobs: [BlobPayload(object_id=b, content=read_object(b)) for b in blobs_to_send], }) mpack_key = blob_id(mpack) # sha256("blob <size>\0" + bytes) size_bytes = len(mpack)

  7. store mpack in object store (MinIO/S3): PUT <s3_bucket>/<mpack_key> = mpack bytes (reuse if already stored — mpack_key is content-addressed)

  8. return { commits_to_send: [c.commit_id for c in commits_to_send], snapshots_to_send: [s.snapshot_id for s in snapshots_to_send], blobs_to_send: [b for b in blobs_to_send], mpack_key: mpack_key, mpack_size_bytes: size_bytes, } ```

GET /fetch/mpack

``` receive { mpack_key: "sha256:..." } (query param)

  1. validate auth

  2. verify mpack_key exists in object store → 404 if missing (client must call /fetch/plan first)

  3. generate presigned GET URL (expiry=15 min)

  4. return { presigned_url } ```


FETCH_HEAD semantics

``` .muse/FETCH_HEAD — written by fetch, read by pull

format: <commit_id> (single line, the remote tip that was fetched)

pull reads FETCH_HEAD to know what to merge: fetched_tip = read(.muse/FETCH_HEAD) muse merge fetched_tip ```


Error table

Condition Exit code Message
Branch not on remote 0 "nothing to fetch"
Already up-to-date 0 "already up-to-date"
Integrity failure 1 "mpack integrity check failed"
Corrupt blob (skipped) 3 (PARTIAL) "skipped N corrupt blobs"
Remote commit not found 1 "commit not found: <id>"
Network error 1 "fetch failed: <reason>"

Pseudocode: clone (fetch + init)

``` clone <url> <dest>:

  1. mkdir <dest>
  2. muse init <dest>
  3. muse remote add local <url> (or origin — TBD)
  4. fetch local <default_branch> ← calls the fetch logic above
  5. checkout -b <default_branch>
  6. restore working tree from FETCH_HEAD snapshot manifest: for each (path, object_id) in manifest: content = read_object(local_root, object_id) write file to <dest>/<path>
  7. update refs/heads/<default_branch> = FETCH_HEAD
  8. clear FETCH_HEAD ```

Pseudocode: pull (fetch + merge)

``` pull <remote> <branch>:

  1. fetch <remote> <branch> ← calls the fetch logic above if already_up_to_date: exit 0

  2. fetched_tip = read(.muse/FETCH_HEAD)

  3. muse merge fetched_tip ← existing merge logic

  4. clear FETCH_HEAD ```


Out of scope (post-MVP)

  • Shallow fetch (`--depth N`)
  • Partial clone / blob filters (`--filter blob:none`)
  • Fetch by tag or arbitrary ref
  • Multi-remote fan-out fetch
  • Delta compression in the mpack wire format
  • Byte-range GET read path (`mpack://` URI + mpack_index table)
  • Fetch pack splitting for packs > 512 MiB
Activity1
gabriel opened this issue 18 days ago
gabriel 16 days ago

Implementation complete — closing

All pseudocode steps are implemented and validated end-to-end.

Deviations from pseudocode (intentional improvements):

  • Steps 3+4 merged: POST /fetch/plan + GET /fetch/mpack?key= were collapsed into a single POST /fetch/mpack that returns the presigned GET URL inline. One less round-trip.
  • FETCH_HEAD: implemented as .muse/remotes/<remote>/<branch> tracking refs rather than a single global FETCH_HEAD file. This correctly tracks per-remote/per-branch state and is what muse pull reads.

Bugs fixed during implementation (all shipped):

  • signer_public_key was hardcoded to "" in push unpack — commit hash verification failed on clone
  • Snapshot delta chain processed in arrival order — parents not resolved if sent after children; fixed with multi-pass topo loop
  • directories not read in delta format path of _apply_snapshot_deltas — snapshot hash mismatch for repos with tracked directories
  • zstd blob encoding lost in build_wire_mpack — blob content arrived garbled; fixed by decompressing before packing
  • commit_exists filter on have anchors caused full re-push when remote had commits not in local store; removed
  • directories not included in snapshot deltas sent in push mpack — server stored []; fixed

Validated: wire-hello (16 commits), muse-zsh (17 commits) — clone, fetch, pull all confirmed working.