Push performance: 3m 57s for 1,024-commit full push — four measured bottlenecks, four targeted fixes
Context
A full push of the musehub repo (1,024 commits, 5,139 objects, ~128 MB) to a fresh local MuseHub instance takes 3 minutes 57 seconds. This is measured, reproducible, and broken down by phase. The wire protocol architecture (presign → R2 PUT → confirm → stream) is correct and appropriate for our Cloudflare constraints. The problems are all implementation bugs in MuseHub — the Muse CLI is fast.
This issue tracks four independent fixes in order of impact. Each phase has a measurable before/after. Do them in order. Do not skip ahead.
Measured baseline (reproduce with)
# Delete and recreate local musehub repo, then:
time muse -C ~/ecosystem/musehub push local dev
Expected output matches this timing breakdown:
| Step | Measured | Where |
|---|---|---|
| Client: object load + walk | 3.8s | Muse CLI — already fast |
POST /push/presign |
37.2s | MuseHub — BROKEN |
| Client: R2 PUTs (128MB, 5,139 objects) | 17.1s | Network — acceptable |
POST /push/confirm |
122.8s | MuseHub — BROKEN |
| Remote frame loop (1,024 commits) | 22.6s | MuseHub — BROKEN |
| Remote phase 7: snapshot upsert (1,018 snaps) | 10.6s | MuseHub — acceptable for now |
| Remote phase 8: commit INSERT (1,024 commits) | 14.3s | MuseHub — acceptable for now |
| Total | 3m 57s | Target: < 60s |
Phase 1 — Fix push/confirm: bulk insert, not per-object inserts
Impact: ~123s → ~2s. This is the single biggest win.
What we know
POST /push/confirm takes 122.8s to register 5,139 objects. That is ~24ms per object. The only way to get 24ms per object on Postgres is if the handler is doing individual session.execute(insert) calls in a loop — one DB round trip per object.
What to do
- Find the confirm view/handler in
musehub/views/ormusehub/services/. - Look at how it writes object rows to the DB. It will look something like:
for obj in objects: await session.execute(insert(BlobObject).values(...).on_conflict_do_nothing()) await session.commit() - Replace with a single bulk operation using
unnestor SQLAlchemy'sinsert().values([...]):await session.execute( insert(BlobObject) .values([{...} for obj in objects]) .on_conflict_do_nothing() ) await session.commit() - If the list is large (> 5,000), chunk into batches of 1,000 — but each chunk should be one
execute, not one per row.
Success criterion
POST /push/confirm for 5,139 objects completes in < 5 seconds. Write a test that inserts 5,000 object rows via the confirm endpoint and asserts the handler returns in < 5s.
New expected total after Phase 1
3m 57s − ~121s = ~1m 56s
Phase 2 — Fix push/presign: parallelize existence checks and URL generation
Impact: ~37s → ~5s.
What we know
POST /push/presign takes 37.2s to process 5,139 object IDs. It needs to:
- Check which objects already exist in storage (ghost-guard)
- Generate presigned PUT URLs for the ones that don't
37s / 5,139 objects = ~7ms/object. This is sequential. Both steps are async-capable.
What to do
- Find the presign view/handler.
- Find the existence check — it will be calling
backend.exists(oid)orbackend.presign_put(oid)in a loop or with low concurrency. - The limits table in the wire protocol doc says: "server-side ghost-guard concurrency: 50 parallel R2 HEAD checks" and "server-side presign_put concurrency: 50 parallel presigned URL generations". Verify these limits are actually being enforced with
asyncio.gatherin batches of 50, not sequentially. - If concurrency is already 50, the bottleneck may be in the DB query for existing object IDs. Replace the per-object DB check with a single
SELECT object_id FROM blob_objects WHERE object_id = ANY(:ids)and compute the not-yet-stored set in Python.
Success criterion
POST /push/presign for 5,139 objects completes in < 8 seconds. The R2 existence check should be a single batch DB query, not N individual queries.
New expected total after Phase 2
1m 56s − ~32s = ~1m 24s
Phase 3 — Fix the frame loop: identify what is blocking per 50-commit chunk
Impact: ~22.6s → ~3s.
What we know
The Muse CLI assembled all 1,024 commits into C frames in 97ms. The server took 22.6s to consume them — about 1.1s per 50-commit chunk. The client is not slow. Something on the server is blocking between chunks.
The PROGRESS output shows:
[+0.9s]: chunk received: 50 commits, total 50
[+3.2s]: chunk received: 50 commits, total 100 ← 2.3s gap
[+4.5s]: chunk received: 50 commits, total 150 ← 1.3s gap
This is not network latency (loopback). Something in the frame loop is doing synchronous or blocking work per chunk.
What to do
- Find
wire_push_streaminmusehub/services/musehub_wire.py— specifically the section that reads COMMIT_PACK frames and emits thechunk receivedPROGRESS message. - Look at what happens between emitting that PROGRESS frame and yielding back to the HTTP layer to read the next chunk. There may be:
- A DB query per chunk (e.g. checking if commits are known)
- A synchronous blob operation
- An
await session.executethat is doing per-commit work during frame receive
- The frame loop should do NOTHING except parse frames and accumulate commits/snapshots in memory. All DB work should happen AFTER the frame loop completes (phases 3b onward).
- If there is any DB work inside the frame loop, move it to after the END frame is received.
Success criterion
The frame loop processes 1,024 commits (20 × 50-commit chunks) in < 4 seconds on loopback. Add a PROGRESS timing line for total frame_loop duration and assert < 4s in a test.
New expected total after Phase 3
1m 24s − ~19s = ~1m 5s
Phase 4 — Snapshot upsert and commit INSERT batch sizing
Impact: ~25s → ~8s. Smaller win, but gets us under 60s.
What we know
- Phase 7 (snapshot upsert, 1,018 snapshots): 10.6s in 11 batches. Two batches took 2.6s each — variance suggests blob PUT or a slow batch. The batch size appears to be ~100 snapshots.
- Phase 8 (commit INSERT, 1,024 commits): 14.3s in 11 batches of ~100. Each batch takes ~1.3s. This is the phase we fixed the 30s anomaly in — now the batch INSERT itself is the bottleneck.
What to do
- Phase 7: Check whether each snapshot batch does a blob PUT before the DB upsert. If so, profile whether the blob PUT or the DB upsert dominates. If it's the DB upsert, increase batch size from 100 to 500.
- Phase 8: The commit INSERT batch does
blob_put + execute + commit. The blob_put is now fast (5ms, fixed in the previous session). Theexecuteis an ON CONFLICT upsert. Check whether the upsert usesunnestfor the full batch or individual row inserts. If individual, convert to bulk. - For both: if blob PUTs are already fast, the DB insert should dominate. Verify batch size. For 1,024 commits, 4 batches of 256 should be faster than 11 batches of ~100.
Success criterion
- Phase 7 completes in < 4s for 1,018 snapshots
- Phase 8 completes in < 5s for 1,024 commits (blob_put already fast; this is pure DB)
New expected total after Phase 4
1m 5s − ~17s = ~48s
Target
Under 60 seconds for a 1,024-commit, 5,139-object full push to local MuseHub. This is the benchmark. Run it after each phase:
# delete local repo, recreate, then:
time muse -C ~/ecosystem/musehub push local dev
After all four phases, the expected breakdown:
| Step | Before | After |
|---|---|---|
| Client work + R2 PUTs | 21s | 21s (unchanged — already fast) |
POST /push/presign |
37s | < 8s |
POST /push/confirm |
123s | < 5s |
| Frame loop | 22.6s | < 4s |
| Phase 7 + 8 | 25s | < 9s |
| Total | 3m 57s | < 48s |
Rules
- Fix one phase at a time. Measure before and after each phase. Do not combine phases.
- Every fix gets a test that asserts the timing bound. Not just correctness — timing.
- No architectural changes. The presign/confirm flow is correct. The wire protocol is correct. The bugs are all in the implementation of specific endpoints and the frame loop.
- Do not spend time on phase 4 until phases 1–3 are done. Phase 1 alone cuts 52% of total time.
Resolved — superseded by #45 and #46
The 3m 57s push time this issue diagnosed is gone. Actual numbers on a 1,043-commit / 5,197-object repo today:
Root cause (identified in #44, fixed in #45): every push serialized objects individually to MinIO. The fix packs everything into one MPackBundle, uploads it as a single presigned PUT, and has the server unpack it in one pass.
Closing as resolved.