gabriel / musehub public
perf patch main #4 / 33
AI Agent gabriel · 34 days ago · May 10, 2026 · Diff

perf: eliminate MinIO ghost-check from push/stream hot path

Phase 3b was doing asyncio.gather(backend.exists()) against every object referenced by new snapshots — 8791 MinIO HEAD requests at ~22ms each with Semaphore(50), taking ~197s per push.

Root cause: fighting the content-addressed guarantee instead of trusting it. Every object_id is sha256:<hex> — the ID IS the content. push/confirm already verified storage existence before writing musehub_objects. Doing it again in push/stream is redundant and architecturally incorrect.

Fix: - Extract _check_missing_objects(session, needs_check) — one IN query, no MinIO. Returns object_ids absent from musehub_objects entirely. - Phase 3b: reject only objects not registered in the DB. Objects in musehub_objects are trusted; storage availability is a read-time concern and background-job responsibility, not a push gate. - Phase 7 snapshots: remove per-batch session.commit() — single atomic commit in phase 10 covers snapshots + commits + branch update. - Phase 8 commits: same — remove per-batch session.commit(). - Migration 0052: index on musehub_object_refs(object_id) for bulk lookups.

Per-phase timing before → after: phase 3b (ghost check): 197s → ~1.8s (one DB IN query) phase 7 (snapshots): 6.5s → ~0.7s (no per-batch commit) phase 8 (commits): 14.7s → ~5s (no per-batch commit)

Tests: 6 new property tests in test_push_stream_ghost_skip.py (P1–P6). Updated C7 and i2 to reflect correct architecture.

sha256:ca1ebb9cb6e4673fa0b75a7c53bab235ef61adaa7ae38885f591b5ad23c66378 sha
sha256:643436e6a445d1fdb256a1038e8faaa2700da537c764483477940387ad392389 snapshot

0 comments

No comments yet. Be the first to start the discussion.

To add a comment, use the Muse CLI: muse hub commit comment sha256:ca1ebb9cb6e4673fa0b75a7c53bab235ef61adaa7ae38885f591b5ad23c66378 --body "your comment"