gabriel / musehub public
Open #46 Performance
filed by gabriel human · 34 days ago

MuseWire Protocol — GitHub-parity performance across all wire commands

0 Anchors
Blast radius
Churn 30d
0 Proposals

The simple version

Moving code between a client and a server is like mailing packages.

Today Muse puts every file in its own envelope, addresses it, stamps it, and mails it one at a time. Git puts everything in one box and mails the box.

This epic makes Muse mail the box.


The one principle that guides every decision

Content-addressing is a proof, not a label.

Every object in Muse has an ID: sha256:<hex>. That ID is computed from the bytes. If you hold an object and compute sha256(its_bytes) and it matches the ID — the content is correct. No external check needed. The math IS the verification.

We were violating this. After uploading objects, we called MinIO HEAD on each one to verify it arrived. That is not trusting the system. Every design decision in this epic must ask: "are we trusting the hash, or are we checking the work of the hash?"

If the answer is "checking the work" — delete that step.


The two paths — why they exist and what they mean

Stream path — small transfers (< 500 objects AND < 50 MB):

  • Everything goes through the MuseHub server in one HTTP request
  • Simple. One round trip.

Presign path — large transfers (≥ 500 objects OR ≥ 50 MB):

  • Cloudflare kills any request that takes more than ~100 seconds
  • Presigned URLs let the client/server talk directly to R2/MinIO, bypassing Cloudflare
  • MuseHub just coordinates (issues the URL, indexes the result)

Both paths must exist. The threshold is fixed by Cloudflare's timeout, not by us. The problem today is not the two-path design — it is that the presign path sends N individual envelopes instead of one box.


The six wire commands

Command Direction Wire cost today Target
push client → server 7,181 MinIO round-trips 1 bundle upload
fetch server → client N individual presigned GETs 1 bundle download
clone server → client same as fetch, no have set 1 bundle download
pull server → client wire cost = fetch same as fetch
ls-remote client reads remote refs 1 HTTP GET already correct
release push / tag push client → server small payloads, own endpoints already correct

The two that matter: push and fetch/clone. They are mirrors of each other. Fix push, then mirror the fix for fetch. Pull and clone fall out for free.


Push — how it should work (five steps)

1. NEGOTIATE    find what the server already has
2. PACK         assemble everything new into one bundle
3. TRANSFER     send the bundle — one request, direct to MinIO/R2
4. INDEX        server unpacks, writes to storage and database
5. ADVANCE REF  server moves the branch pointer

Step 1 — Negotiate

  • A. Client reads remote branch heads (GET /refs)
  • B. Client sends its local commits (POST /negotiate)
  • C. Server ACKs what it has; returns common base
  • Result: client knows exactly what is new

Step 2 — Pack

  • A. BFS walk from local tip, stopping at common base
  • B. Collect all new: commits + snapshot manifests + file objects
  • C. Serialize as one MPackBundle (msgpack binary)
  • D. bundle_id = sha256(bundle_bytes) — the bundle is content-addressed
  • Result: one binary blob, self-describing, self-verifying

Step 3 — Transfer

  • Stream path (small): POST bundle inline to /push/stream
  • Presign path (large):
    • A. POST /push/bundle/presign → server issues one presigned PUT URL
    • B. PUT bundle → MinIO/R2 — Cloudflare never sees it
  • Result: bundle is in storage

Step 4 — Index (server-side)

  • A. Read bundle by bundle_id from storage
  • B. Verify sha256(bundle_bytes) == bundle_id — ONE hash check covers all contents
  • C. Unpack: extract objects, snapshots, commits
  • D. Write file objects to MinIO in parallel (warm S3 pool — fast)
  • E. Bulk INSERT objects → musehub_objects (chunked at 500 rows)
  • F. Bulk upsert snapshots → musehub_snapshots (chunked at 500 rows)
  • G. Bulk INSERT commits → musehub_commits (chunked at 100 rows)
  • H. Upsert object refs (chunked at 1,000 rows)
  • No ghost guard. Step B already verified integrity. Trust the hash.

Step 5 — Advance ref

  • A. Lock branch row (SELECT FOR UPDATE)
  • B. Fast-forward check (new head must be descendant of current tip, or --force)
  • C. Update branch pointer
  • D. Single session.commit() — all of step 4 + step 5 is atomic

Fetch/Clone — mirror of push (five steps)

1. NEGOTIATE    tell server what we already have
2. PACK         server assembles delta into one bundle
3. TRANSFER     download bundle — one request, direct from MinIO/R2
4. UNPACK       client extracts objects, commits, snapshots
5. ADVANCE REF  client updates local branch pointer

Step 1 — Negotiate

  • A. Client sends want (what it wants) and have (what it already has)
  • B. Server walks DAG, computes delta
  • Clone: have is empty — delta is the entire repo

Step 2 — Pack (server-side)

  • A. Walk from want, stop at have
  • B. Assemble MPackBundle: commits + snapshots + objects
  • C. bundle_id = sha256(bundle_bytes)
  • D. Write bundle to MinIO

Step 3 — Transfer

  • Stream path (small): server streams bundle inline
  • Presign path (large):
    • A. Server writes bundle to MinIO, generates presigned GET URL
    • B. GET bundle from MinIO/R2 — Cloudflare never sees it

Step 4 — Unpack (client-side)

  • A. Verify sha256(bundle_bytes) == bundle_id
  • B. Write file objects to local store
  • C. Write snapshots to local store
  • D. Write commits to local store
  • No per-object verification loop. Step A covers all contents.

Step 5 — Advance ref

  • A. Fast-forward or reset depending on fetch mode
  • B. Update local branch pointer

Content-addressing checklist

At every step, ask: are we trusting the hash or checking the work of the hash?

Step Old (wrong) New (right)
Push: verify objects arrived in MinIO HEAD each object (ghost guard) sha256(bundle) == bundle_id proves it
Push: store snapshot blobs in MinIO separately 1,018 individual PUTs snapshots are in the bundle, indexed once
Push: store commit blobs in MinIO separately 1,024 individual PUTs commits are in the bundle, indexed once
Fetch: verify each received object per-object hash check loop sha256(bundle) proves all contents at once

Performance targets (GitHub parity)

Benchmark: 1,024-commit repo, 5,139 file objects, ~128 MB total.

Command Today Target
push (fresh) 2m 44s < 15s
fetch (fresh clone) not measured < 15s
pull (incremental, ~10 commits) not measured < 2s
ls-remote < 1s < 1s (already fine)

Child tickets

  • #45 — Push: bundle-based upload (implementation spec with exact code)
  • #47 — Fetch/Clone: bundle-based download (mirror of #45)

Work in order: push first (data must get to server before fetch can bring it back).


Branches

  • ~/ecosystem/musetask/bundle-push (client changes)
  • ~/ecosystem/musehubtask/bundle-push (server changes)
Activity2
gabriel opened this issue 34 days ago
gabriel 34 days ago

Master performance gates

Every command has a pass/fail gate. A command is not done until it passes.

Push (#45)

Step Gate
1. Negotiate < 500ms
2. Pack < 5s
3. Transfer (bundle PUT) < 5s
4. Index (server unpack + DB) < 10s
5. Advance ref < 500ms
Total time muse push local dev < 15s

Fetch / Clone (#47) — do not start until push passes

Step Gate
1. Negotiate < 500ms
2. Pack (server builds bundle) < 10s
3. Transfer (bundle GET) < 5s
4. Unpack (client writes local store) < 5s
5. Advance ref < 100ms
Total time muse clone <url> < 15s

Pull — falls out of fetch

Pull = fetch + local merge. When fetch passes its gate, pull passes too.

ls-remote, release push, tag push

Already pass. No work needed.


The rule: measure each step individually. If a step fails its gate, fix that step before moving on. Do not optimize step 4 while step 3 is still failing.

gabriel 34 days ago

Stream path — architectural intent (do not delete)

The stream path (push/stream, O frames) is not legacy code. It is the future micropayment-capable push path and must be preserved permanently alongside the bundle/presign path.

Why two paths exist:

  • Presign/bundle path — bulk efficiency. Client packs everything into one MPackBundle, uploads via a single presigned PUT directly to MinIO (bypassing Cloudflare), server unpacks in one pass. Optimal for large pushes. The server never sees individual objects, so there is no per-object hook for payment.

  • Stream path (O frames) — real-time, per-object, payment-capable transfer. Each O frame carries one content-addressed object. sha256:<hex> is simultaneously the integrity proof and the billable unit — you cannot fake delivery because the frame IS the proof. No one has done per-HTTP-frame streaming micropayments bound to content-addressed objects. This is the primitive for that.

Future direction: HTTP/2 bidirectional multiplexing + per-frame MSign-signed micropayment authorization. Each O frame in flight carries a payment channel update authorizing payment for exactly that object. The presign path cannot do this — the bundle bypasses the server entirely.

Implication for this epic: the stream path is not a code path to optimize away. It needs security parity with the bundle path (tracked in issue #51) and then becomes the foundation for the payment layer.