gabriel / musehub public
perf BREAKING task/perf-push-pipeline #1 / 1
gabriel · 75 days ago · Apr 9, 2026 · Diff

perf: eliminate O(N×M) snapshot row explosion and parallelise object uploads

Wire push pipeline now runs at GitHub scale:

1. manifest_blob (migration 0031): snapshot manifests stored as a single msgpack BYTEA blob per snapshot instead of N×M rows in snapshot_entries. 906 commits × 4000 files = 3.6M rows → 906 rows.

2. wire_push_objects bulk INSERT: replaced N session.add() + N/50 flush() with pg_insert(...).on_conflict_do_nothing() — one round-trip for any batch size.

3. SHA-256 verification off event loop: asyncio.to_thread() so concurrent chunk uploads are not serialised by CPU-bound hashing.

4. Concurrent storage: asyncio.gather() writes all new object bytes in parallel instead of sequentially.

5. Symbol indexer OOM cap (_MAX_INDEX_COMMITS_PER_PUSH=50): prevents server crash on first push of a large repo while job queue is built.

Tests: 5 new perf regression tests enforce concrete latency budgets: - bulk_upsert 100×500 snapshots < 3s - bulk_upsert 1000×100 snapshots < 5s - wire_push_objects 500 new objects < 1s - wire_push_objects 500 existing objects < 200ms (1 SELECT IN, zero writes) - 200-commit push does not OOM server

sha256:e649c062850aae26b3d40684a22d5bda0443feca8f348bb6f37ddf6046e965e7 sha
sha256:7bdb93d56070d556afb24d4e3dfa0684cbba52ae00af88c382ec8bf4cab80910 snapshot
← Older Oldest on task/perf-push-pipeline
All commits
Newer → Latest on task/perf-push-pipeline

0 comments

No comments yet. Be the first to start the discussion.

To add a comment, use the Muse CLI: muse hub commit comment sha256:e649c062850aae26b3d40684a22d5bda0443feca8f348bb6f37ddf6046e965e7 --body "your comment"