Closed #43 Performance

filed by gabriel human · 35 days ago

Push performance: 3m 57s for 1,024-commit full push — four measured bottlenecks, four targeted fixes

0 Anchors

— Blast radius

— Churn 30d

0 Proposals

Context

A full push of the musehub repo (1,024 commits, 5,139 objects, ~128 MB) to a fresh local MuseHub instance takes 3 minutes 57 seconds. This is measured, reproducible, and broken down by phase. The wire protocol architecture (presign → R2 PUT → confirm → stream) is correct and appropriate for our Cloudflare constraints. The problems are all implementation bugs in MuseHub — the Muse CLI is fast.

This issue tracks four independent fixes in order of impact. Each phase has a measurable before/after. Do them in order. Do not skip ahead.

Measured baseline (reproduce with)

# Delete and recreate local musehub repo, then:
time muse -C ~/ecosystem/musehub push local dev

Expected output matches this timing breakdown:

Step	Measured	Where
Client: object load + walk	3.8s	Muse CLI — already fast
`POST /push/presign`	37.2s	MuseHub — BROKEN
Client: R2 PUTs (128MB, 5,139 objects)	17.1s	Network — acceptable
`POST /push/confirm`	122.8s	MuseHub — BROKEN
Remote frame loop (1,024 commits)	22.6s	MuseHub — BROKEN
Remote phase 7: snapshot upsert (1,018 snaps)	10.6s	MuseHub — acceptable for now
Remote phase 8: commit INSERT (1,024 commits)	14.3s	MuseHub — acceptable for now
Total	3m 57s	Target: < 60s

Phase 1 — Fix `push/confirm`: bulk insert, not per-object inserts

Impact: ~123s → ~2s. This is the single biggest win.

What we know

POST /push/confirm takes 122.8s to register 5,139 objects. That is ~24ms per object. The only way to get 24ms per object on Postgres is if the handler is doing individual session.execute(insert) calls in a loop — one DB round trip per object.

What to do

Find the confirm view/handler in musehub/views/ or musehub/services/.

Look at how it writes object rows to the DB. It will look something like:

for obj in objects:
    await session.execute(insert(BlobObject).values(...).on_conflict_do_nothing())
await session.commit()

Replace with a single bulk operation using unnest or SQLAlchemy's insert().values([...]):

await session.execute(
    insert(BlobObject)
    .values([{...} for obj in objects])
    .on_conflict_do_nothing()
)
await session.commit()

If the list is large (> 5,000), chunk into batches of 1,000 — but each chunk should be one execute, not one per row.

Success criterion

POST /push/confirm for 5,139 objects completes in < 5 seconds. Write a test that inserts 5,000 object rows via the confirm endpoint and asserts the handler returns in < 5s.

New expected total after Phase 1

3m 57s − ~121s = ~1m 56s

Phase 2 — Fix `push/presign`: parallelize existence checks and URL generation

Impact: ~37s → ~5s.

What we know

POST /push/presign takes 37.2s to process 5,139 object IDs. It needs to:

Check which objects already exist in storage (ghost-guard)
Generate presigned PUT URLs for the ones that don't

37s / 5,139 objects = ~7ms/object. This is sequential. Both steps are async-capable.

What to do

Find the presign view/handler.
Find the existence check — it will be calling backend.exists(oid) or backend.presign_put(oid) in a loop or with low concurrency.
The limits table in the wire protocol doc says: "server-side ghost-guard concurrency: 50 parallel R2 HEAD checks" and "server-side presign_put concurrency: 50 parallel presigned URL generations". Verify these limits are actually being enforced with asyncio.gather in batches of 50, not sequentially.
If concurrency is already 50, the bottleneck may be in the DB query for existing object IDs. Replace the per-object DB check with a single SELECT object_id FROM blob_objects WHERE object_id = ANY(:ids) and compute the not-yet-stored set in Python.

Success criterion

POST /push/presign for 5,139 objects completes in < 8 seconds. The R2 existence check should be a single batch DB query, not N individual queries.

New expected total after Phase 2

1m 56s − ~32s = ~1m 24s

Phase 3 — Fix the frame loop: identify what is blocking per 50-commit chunk

Impact: ~22.6s → ~3s.

What we know

The Muse CLI assembled all 1,024 commits into C frames in 97ms. The server took 22.6s to consume them — about 1.1s per 50-commit chunk. The client is not slow. Something on the server is blocking between chunks.

The PROGRESS output shows:

[+0.9s]: chunk received: 50 commits, total 50
[+3.2s]: chunk received: 50 commits, total 100   ← 2.3s gap
[+4.5s]: chunk received: 50 commits, total 150   ← 1.3s gap

This is not network latency (loopback). Something in the frame loop is doing synchronous or blocking work per chunk.

What to do

Find wire_push_stream in musehub/services/musehub_wire.py — specifically the section that reads COMMIT_PACK frames and emits the chunk received PROGRESS message.
Look at what happens between emitting that PROGRESS frame and yielding back to the HTTP layer to read the next chunk. There may be:
- A DB query per chunk (e.g. checking if commits are known)
- A synchronous blob operation
- An await session.execute that is doing per-commit work during frame receive
The frame loop should do NOTHING except parse frames and accumulate commits/snapshots in memory. All DB work should happen AFTER the frame loop completes (phases 3b onward).
If there is any DB work inside the frame loop, move it to after the END frame is received.

Success criterion

The frame loop processes 1,024 commits (20 × 50-commit chunks) in < 4 seconds on loopback. Add a PROGRESS timing line for total frame_loop duration and assert < 4s in a test.

New expected total after Phase 3

1m 24s − ~19s = ~1m 5s

Phase 4 — Snapshot upsert and commit INSERT batch sizing

Impact: ~25s → ~8s. Smaller win, but gets us under 60s.

What we know

Phase 7 (snapshot upsert, 1,018 snapshots): 10.6s in 11 batches. Two batches took 2.6s each — variance suggests blob PUT or a slow batch. The batch size appears to be ~100 snapshots.
Phase 8 (commit INSERT, 1,024 commits): 14.3s in 11 batches of ~100. Each batch takes ~1.3s. This is the phase we fixed the 30s anomaly in — now the batch INSERT itself is the bottleneck.

What to do

Phase 7: Check whether each snapshot batch does a blob PUT before the DB upsert. If so, profile whether the blob PUT or the DB upsert dominates. If it's the DB upsert, increase batch size from 100 to 500.
Phase 8: The commit INSERT batch does blob_put + execute + commit. The blob_put is now fast (5ms, fixed in the previous session). The execute is an ON CONFLICT upsert. Check whether the upsert uses unnest for the full batch or individual row inserts. If individual, convert to bulk.
For both: if blob PUTs are already fast, the DB insert should dominate. Verify batch size. For 1,024 commits, 4 batches of 256 should be faster than 11 batches of ~100.

Success criterion

Phase 7 completes in < 4s for 1,018 snapshots
Phase 8 completes in < 5s for 1,024 commits (blob_put already fast; this is pure DB)

New expected total after Phase 4

1m 5s − ~17s = ~48s

Target

Under 60 seconds for a 1,024-commit, 5,139-object full push to local MuseHub. This is the benchmark. Run it after each phase:

# delete local repo, recreate, then:
time muse -C ~/ecosystem/musehub push local dev

After all four phases, the expected breakdown:

Step	Before	After
Client work + R2 PUTs	21s	21s (unchanged — already fast)
`POST /push/presign`	37s	< 8s
`POST /push/confirm`	123s	< 5s
Frame loop	22.6s	< 4s
Phase 7 + 8	25s	< 9s
Total	3m 57s	< 48s

Rules

Fix one phase at a time. Measure before and after each phase. Do not combine phases.
Every fix gets a test that asserts the timing bound. Not just correctness — timing.
No architectural changes. The presign/confirm flow is correct. The wire protocol is correct. The bugs are all in the implementation of specific endpoints and the frame loop.
Do not spend time on phase 4 until phases 1–3 are done. Phase 1 alone cuts 52% of total time.

◎ Activity1

●

gabriel opened this issue 35 days ago

○

gabriel 34 days ago

Resolved — superseded by #45 and #46

The 3m 57s push time this issue diagnosed is gone. Actual numbers on a 1,043-commit / 5,197-object repo today:

Step	Before	After
Object storage (7,181 individual MinIO PUTs)	~158s	0s — eliminated
Bundle PUT (one presigned PUT)	—	0.363s
Server unpack + DB index	~140s	0.217s
Total	~2m 44s	3.82s

Root cause (identified in #44, fixed in #45): every push serialized objects individually to MinIO. The fix packs everything into one MPackBundle, uploads it as a single presigned PUT, and has the server unpack it in one pass.

Closing as resolved.

Assignee

gabriel human

Release

no commits linked to this issue

create

muse hub issue create \
  --title "..." \
  --body "..." \
  --label bug \
  --anchor path/to/file.py::Symbol \
  --commit-anchor <sha> \
  --repo gabriel/musehub

read

muse hub issue get 43 --json
muse hub issue list --state open --json

update

muse hub issue edit 43 \
  --anchor path/to/file.py::Symbol \
  --repo gabriel/musehub

comment

muse hub issue comment 43 \
  --body "Fixed in <sha>" \
  --repo gabriel/musehub

reopen

muse hub issue reopen 43 \
  --repo gabriel/musehub

create

create_issue({
  repo_id: "sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75",
  title: "...",
  body: "...",
  labels: ["bug"],
  symbol_anchors: [
    "path/to/file.py::Symbol"
  ],
  commit_anchors: ["<sha>"]
})

read

get_issue({
  repo_id: "sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75",
  issue_number: 43
})

list_issues({
  repo_id: "sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75",
  state: "open"
})

update

edit_issue({
  repo_id: "sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75",
  issue_number: 43,
  symbol_anchors: ["path/to/file.py::Symbol"]
})

comment

create_issue_comment({
  repo_id: "sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75",
  issue_number: 43,
  body: "..."
})

reopen

reopen_issue({
  repo_id: "sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75",
  issue_number: 43
})

create

curl -X POST \
  http://localhost:10003/api/repos/sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75/issues \
  -H "Content-Type: application/json" \
  -H "Authorization: MSign handle=\"...\" ts=... sig=\"...\"" \
  -d '{
    "title": "...",
    "body": "...",
    "labels": ["bug"],
    "symbol_anchors": ["path/to/file.py::Symbol"]
  }'

read

# get one issue
curl http://localhost:10003/api/repos/sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75/issues/43

# list open issues
curl "http://localhost:10003/api/repos/sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75/issues?state=open"

update

curl -X PATCH \
  http://localhost:10003/api/repos/sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75/issues/43 \
  -H "Content-Type: application/json" \
  -H "Authorization: MSign handle=\"...\" ts=... sig=\"...\"" \
  -d '{"title": "...", "body": "..."}'

comment

curl -X POST \
  http://localhost:10003/api/repos/sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75/issues/43/comments \
  -H "Content-Type: application/json" \
  -H "Authorization: MSign handle=\"...\" ts=... sig=\"...\"" \
  -d '{"body": "Fixed in <sha>"}'

reopen

curl -X POST \
  http://localhost:10003/api/repos/sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75/issues/43/reopen \
  -H "Authorization: MSign handle=\"...\" ts=... sig=\"...\""

Push performance: 3m 57s for 1,024-commit full push — four measured bottlenecks, four targeted fixes

Context

Measured baseline (reproduce with)

Phase 1 — Fix push/confirm: bulk insert, not per-object inserts

What we know

What to do

Success criterion

New expected total after Phase 1

Phase 2 — Fix push/presign: parallelize existence checks and URL generation

What we know

What to do

Success criterion

New expected total after Phase 2

Phase 3 — Fix the frame loop: identify what is blocking per 50-commit chunk

What we know

What to do

Success criterion

New expected total after Phase 3

Phase 4 — Snapshot upsert and commit INSERT batch sizing

What we know

What to do

Success criterion

New expected total after Phase 4

Target

Rules

Resolved — superseded by #45 and #46

Phase 1 — Fix `push/confirm`: bulk insert, not per-object inserts

Phase 2 — Fix `push/presign`: parallelize existence checks and URL generation