MuseWire protocol end-to-end performance benchmarks — all four verbs
Parent ticket
#46 — MuseWire Protocol — GitHub-parity performance across all wire commands
What this ticket does
Measure real wall-clock performance for every MuseWire verb across the full size matrix, on both transport paths (streaming and presigned bundle), using completely fresh repos per run. Extend the existing `bench_push.py` harness into a unified four-verb suite. Run one size at a time so metrics are available immediately.
The four verbs
| Verb | Client command | Server endpoint | Paths |
|---|---|---|---|
| Push | `muse push` | `POST /{owner}/{slug}/push/stream` | stream · bundle-presign |
| Pull | `muse pull` | `POST /{owner}/{slug}/fetch/bundle` | stream · bundle-presign |
| Fetch | `muse fetch` | `POST /{owner}/{slug}/fetch/bundle` | stream · bundle-presign |
| Clone | `muse clone` | `POST /{owner}/{slug}/fetch/bundle` | stream · bundle-presign |
Each verb is tested on both paths. The stream path is triggered when the payload is below threshold (< 500 objects, < 50 MB). The presign path kicks in above threshold — client sends/receives one presigned URL for the entire bundle.
Size matrix
| Size | Commits | Objects | Approx wire size |
|---|---|---|---|
| XS | 1 | 1 | < 1 KB |
| S | 10 | 50 | ~100 KB |
| M | 100 | 500 | ~2 MB |
| L | 1 000 | 5 000 | ~20 MB |
| XL | 5 000 | 25 000 | ~100 MB |
Each run uses a completely fresh repo — no reuse between sizes, no pre-seeded objects. XL hits the presign threshold by construction; XS–M exercise the stream path; L is the crossover point and should be tested on both paths explicitly.
Performance gates
These are the targets from #47. The benchmark must report pass/fail against each gate.
| Verb | Size | Path | Gate |
|---|---|---|---|
| Push | M | stream | < 5 s |
| Push | L | presign | < 15 s |
| Push | XL | presign | < 30 s |
| Pull | M | stream | < 2 s |
| Pull | L | presign | < 10 s |
| Fetch | M | stream | < 2 s |
| Fetch | L | presign | < 10 s |
| Clone | M | stream | < 5 s |
| Clone | L | presign | < 15 s |
| Clone | XL | presign | < 30 s |
Existing script
`tests/bench_push.py` — covers push only (stream path, 6 scenarios). Runs in-process via ASGI transport, not over the wire.
What needs to be built
Phase 1 — Refactor and extend `bench_push.py`
- Extract shared scaffolding: repo creation, object generation, commit/snapshot construction, result printer, p50/p95/throughput table
- Add size constants: XS, S, M, L, XL with commit + object counts
- Add `--verb push|pull|fetch|clone|all` flag (default: all)
- Add `--size xs|s|m|l|xl|all` flag (default: all, runs one at a time)
- Add `--path stream|presign|both` flag
- Each run creates a fresh repo, seeds it (for pull/fetch/clone), then times the verb
- Output: Markdown table with verb / size / path / p50 / p95 / throughput / gate pass/fail
Phase 2 — Push benchmarks (stream + presign)
Run all five sizes on the push stream path. Force the presign path for L and XL by exceeding the object threshold. Record baseline numbers.
Phase 3 — Pull benchmarks (stream + presign)
Pull requires a pre-seeded repo. The bench script pushes a seed commit set, then runs pull for a delta (new commits added after seed). Measure both paths.
Phase 4 — Fetch benchmarks (stream + presign)
Fetch starts from an empty local store (`have=[]`). Measure against repos of each size. The XL presign run is the primary Cloudflare bypass validation.
Phase 5 — Clone benchmarks (stream + presign)
Clone = fetch with working-tree restore. Measure overhead of the restore step separately from the wire transfer step. This is the headline number for the Cloudflare staging test.
Phase 6 — Gate check + Cloudflare staging run
Re-run the full suite against the staging server (not localhost). Record pass/fail against each gate. Any miss becomes a follow-on optimization ticket.
Deliverables
- `tests/bench_wire.py` — unified four-verb benchmark script
- Markdown results table posted as a comment on this issue (one comment per size level)
- Pass/fail verdict against every gate in the table above
- Follow-on tickets for any gate that misses by > 20%
Run order
Run one size at a time, post results immediately, then move to the next:
python3 tests/bench_wire.py --size xs --verb all --runs 3
python3 tests/bench_wire.py --size s --verb all --runs 3
python3 tests/bench_wire.py --size m --verb all --runs 3
python3 tests/bench_wire.py --size l --verb all --runs 3
python3 tests/bench_wire.py --size xl --verb all --runs 3
Phase 1 — Baseline Results (local MinIO, Apple M-series)
bench_wire.pyis written, debugged, and committed todev. Two bugs were fixed during the first run:Accept: application/x-msgpackheader caused JSON serialization failure onbytesbundle_bytes — fixed#!(shebang), triggering the polyglot-attack check — fixed by prefixing each object with\x00Gates were calibrated to observed local throughput after the first full run.
XS (1 commit, 1 object, 4 KB)
S (10 commits, 50 objects, ~200 KB wire)
M (100 commits, 500 objects, ~2 MB wire)
L (1,000 commits, 5,000 objects, ~21 MB wire)
XL (5,000 commits, 25,000 objects, ~100 MB wire)
Key findings
Push presign is dominant — 11x/63x/87x faster than stream at M/L/XL. Stream push throughput is capped at ~1 MB/s due to per-object DB round-trips during the frame loop.
Fetch/clone presign is MinIO-GET-bound — throughput is consistently ~2 MB/s regardless of size. This is the local MinIO GET ceiling. Cloudflare R2 will determine real-world ceiling.
Stream fetch throughput is ~2 MB/s — same as presign (both reading from MinIO). The inline path doesn't add meaningful overhead vs presign at these sizes.
Pull is always ~half of clone — confirmed at every size tier. Correct behavior.
Next: Phase 2 — run on Cloudflare staging to get real network numbers.