MuseWire Protocol — GitHub-parity performance across all wire commands
The simple version
Moving code between a client and a server is like mailing packages.
Today Muse puts every file in its own envelope, addresses it, stamps it, and mails it one at a time. Git puts everything in one box and mails the box.
This epic makes Muse mail the box.
The one principle that guides every decision
Content-addressing is a proof, not a label.
Every object in Muse has an ID: sha256:<hex>. That ID is computed from the bytes.
If you hold an object and compute sha256(its_bytes) and it matches the ID — the content is correct.
No external check needed. The math IS the verification.
We were violating this. After uploading objects, we called MinIO HEAD on each one to verify it arrived. That is not trusting the system. Every design decision in this epic must ask: "are we trusting the hash, or are we checking the work of the hash?"
If the answer is "checking the work" — delete that step.
The two paths — why they exist and what they mean
Stream path — small transfers (< 500 objects AND < 50 MB):
- Everything goes through the MuseHub server in one HTTP request
- Simple. One round trip.
Presign path — large transfers (≥ 500 objects OR ≥ 50 MB):
- Cloudflare kills any request that takes more than ~100 seconds
- Presigned URLs let the client/server talk directly to R2/MinIO, bypassing Cloudflare
- MuseHub just coordinates (issues the URL, indexes the result)
Both paths must exist. The threshold is fixed by Cloudflare's timeout, not by us. The problem today is not the two-path design — it is that the presign path sends N individual envelopes instead of one box.
The six wire commands
| Command | Direction | Wire cost today | Target |
|---|---|---|---|
push |
client → server | 7,181 MinIO round-trips | 1 bundle upload |
fetch |
server → client | N individual presigned GETs | 1 bundle download |
clone |
server → client | same as fetch, no have set |
1 bundle download |
pull |
server → client | wire cost = fetch | same as fetch |
ls-remote |
client reads remote refs | 1 HTTP GET | already correct |
release push / tag push |
client → server | small payloads, own endpoints | already correct |
The two that matter: push and fetch/clone. They are mirrors of each other. Fix push, then mirror the fix for fetch. Pull and clone fall out for free.
Push — how it should work (five steps)
1. NEGOTIATE find what the server already has
2. PACK assemble everything new into one bundle
3. TRANSFER send the bundle — one request, direct to MinIO/R2
4. INDEX server unpacks, writes to storage and database
5. ADVANCE REF server moves the branch pointer
Step 1 — Negotiate
- A. Client reads remote branch heads (
GET /refs) - B. Client sends its local commits (
POST /negotiate) - C. Server ACKs what it has; returns common base
- Result: client knows exactly what is new
Step 2 — Pack
- A. BFS walk from local tip, stopping at common base
- B. Collect all new: commits + snapshot manifests + file objects
- C. Serialize as one MPackBundle (msgpack binary)
- D.
bundle_id = sha256(bundle_bytes)— the bundle is content-addressed - Result: one binary blob, self-describing, self-verifying
Step 3 — Transfer
- Stream path (small): POST bundle inline to
/push/stream - Presign path (large):
- A.
POST /push/bundle/presign→ server issues one presigned PUT URL - B.
PUT bundle → MinIO/R2— Cloudflare never sees it
- A.
- Result: bundle is in storage
Step 4 — Index (server-side)
- A. Read bundle by
bundle_idfrom storage - B. Verify
sha256(bundle_bytes) == bundle_id— ONE hash check covers all contents - C. Unpack: extract objects, snapshots, commits
- D. Write file objects to MinIO in parallel (warm S3 pool — fast)
- E. Bulk INSERT objects →
musehub_objects(chunked at 500 rows) - F. Bulk upsert snapshots →
musehub_snapshots(chunked at 500 rows) - G. Bulk INSERT commits →
musehub_commits(chunked at 100 rows) - H. Upsert object refs (chunked at 1,000 rows)
- No ghost guard. Step B already verified integrity. Trust the hash.
Step 5 — Advance ref
- A. Lock branch row (
SELECT FOR UPDATE) - B. Fast-forward check (new head must be descendant of current tip, or
--force) - C. Update branch pointer
- D. Single
session.commit()— all of step 4 + step 5 is atomic
Fetch/Clone — mirror of push (five steps)
1. NEGOTIATE tell server what we already have
2. PACK server assembles delta into one bundle
3. TRANSFER download bundle — one request, direct from MinIO/R2
4. UNPACK client extracts objects, commits, snapshots
5. ADVANCE REF client updates local branch pointer
Step 1 — Negotiate
- A. Client sends
want(what it wants) andhave(what it already has) - B. Server walks DAG, computes delta
- Clone:
haveis empty — delta is the entire repo
Step 2 — Pack (server-side)
- A. Walk from
want, stop athave - B. Assemble MPackBundle: commits + snapshots + objects
- C.
bundle_id = sha256(bundle_bytes) - D. Write bundle to MinIO
Step 3 — Transfer
- Stream path (small): server streams bundle inline
- Presign path (large):
- A. Server writes bundle to MinIO, generates presigned GET URL
- B.
GET bundle from MinIO/R2— Cloudflare never sees it
Step 4 — Unpack (client-side)
- A. Verify
sha256(bundle_bytes) == bundle_id - B. Write file objects to local store
- C. Write snapshots to local store
- D. Write commits to local store
- No per-object verification loop. Step A covers all contents.
Step 5 — Advance ref
- A. Fast-forward or reset depending on fetch mode
- B. Update local branch pointer
Content-addressing checklist
At every step, ask: are we trusting the hash or checking the work of the hash?
| Step | Old (wrong) | New (right) |
|---|---|---|
| Push: verify objects arrived in MinIO | HEAD each object (ghost guard) | sha256(bundle) == bundle_id proves it |
| Push: store snapshot blobs in MinIO separately | 1,018 individual PUTs | snapshots are in the bundle, indexed once |
| Push: store commit blobs in MinIO separately | 1,024 individual PUTs | commits are in the bundle, indexed once |
| Fetch: verify each received object | per-object hash check loop | sha256(bundle) proves all contents at once |
Performance targets (GitHub parity)
Benchmark: 1,024-commit repo, 5,139 file objects, ~128 MB total.
| Command | Today | Target |
|---|---|---|
push (fresh) |
2m 44s | < 15s |
fetch (fresh clone) |
not measured | < 15s |
pull (incremental, ~10 commits) |
not measured | < 2s |
ls-remote |
< 1s | < 1s (already fine) |
Child tickets
- #45 — Push: bundle-based upload (implementation spec with exact code)
- #47 — Fetch/Clone: bundle-based download (mirror of #45)
Work in order: push first (data must get to server before fetch can bring it back).
Branches
~/ecosystem/muse→task/bundle-push(client changes)~/ecosystem/musehub→task/bundle-push(server changes)
Stream path — architectural intent (do not delete)
The stream path (push/stream, O frames) is not legacy code. It is the future micropayment-capable push path and must be preserved permanently alongside the bundle/presign path.
Why two paths exist:
Presign/bundle path — bulk efficiency. Client packs everything into one MPackBundle, uploads via a single presigned PUT directly to MinIO (bypassing Cloudflare), server unpacks in one pass. Optimal for large pushes. The server never sees individual objects, so there is no per-object hook for payment.
Stream path (O frames) — real-time, per-object, payment-capable transfer. Each O frame carries one content-addressed object.
sha256:<hex>is simultaneously the integrity proof and the billable unit — you cannot fake delivery because the frame IS the proof. No one has done per-HTTP-frame streaming micropayments bound to content-addressed objects. This is the primitive for that.
Future direction: HTTP/2 bidirectional multiplexing + per-frame MSign-signed micropayment authorization. Each O frame in flight carries a payment channel update authorizing payment for exactly that object. The presign path cannot do this — the bundle bypasses the server entirely.
Implication for this epic: the stream path is not a code path to optimize away. It needs security parity with the bundle path (tracked in issue #51) and then becomes the foundation for the payment layer.
Master performance gates
Every command has a pass/fail gate. A command is not done until it passes.
Push (#45)
time muse push local devFetch / Clone (#47) — do not start until push passes
time muse clone <url>Pull — falls out of fetch
Pull = fetch + local merge. When fetch passes its gate, pull passes too.
ls-remote, release push, tag push
Already pass. No work needed.
The rule: measure each step individually. If a step fails its gate, fix that step before moving on. Do not optimize step 4 while step 3 is still failing.