Closed
#44
Performance
Bundle-based push: replace N individual object PUTs with one pack upload
0
Anchors
—
Blast radius
—
Churn 30d
0
Proposals
Problem
Every push today decomposes into N individual HTTP round-trips to the object store:
- Presign path: 5,139 individual PUT requests to MinIO/R2 (one per object)
- Phase 7: 1,018 individual PUT requests for snapshot blobs
- Phase 8: 1,024 individual PUT requests for commit blobs
At ~22ms per MinIO round-trip: 7,181 × 22ms = 158 seconds minimum, just in object storage I/O. This is why a 1,024-commit push takes 2m 44s. Git does the equivalent in one transfer.
Root cause
The push path does not use the bundle format. MPackBundle already exists and already carries commits + snapshots + objects in one self-contained msgpack file. The push path reinvents this, badly, by decomposing the bundle into individual round-trips.
Solution
Wire the push path through the bundle format. One upload, one server-side unpack, one index pass.
Client Server
────── ──────
1. Serialize push delta as bundle
2. POST /push/bundle/presign → issue one presigned PUT URL for bundle
3. PUT bundle.muse → MinIO → one HTTP request (all objects+commits+snaps)
4. POST /push/bundle/confirm → server reads bundle from MinIO
unpacks objects → indexes into PG
updates branch ref
← result frame
Implementation — ordered, atomic steps
Step 1: Client — bundle serialization
muse pushalready callsmuse bundle createlogic internally for local transfer- Modify the presign path decision: instead of collecting object IDs and requesting N presigned URLs, serialize the entire push delta as an in-memory
MPackBundle - The bundle carries: all file objects, all snapshot manifests, all commits
- Output:
bundle_bytes: bytes,bundle_id: str(sha256 of bundle bytes)
Step 2: Server — POST /push/bundle/presign
- Accept:
{bundle_id: str, size_bytes: int} - Call
backend.presign_put(bundle_id, ttl=3600)— one presigned URL for the whole bundle - Return:
{presigned_url: str, bundle_id: str} - No object-by-object existence check. No N presigned URLs. One URL.
Step 3: Client — single PUT
- PUT
bundle_bytesto the presigned URL - One HTTP request, all data
- No loop, no semaphore, no concurrency management needed
Step 4: Server — POST /push/bundle/confirm
- Accept:
{bundle_id: str, branch: str, head: str, force: bool} - Read bundle from MinIO:
data = await backend.get(bundle_id) - Call
bundle_unpack(data)→ yields(object_id, bytes)for each object, plus commit and snapshot records - For each object: write to MinIO by object_id (parallel, using S3 thread pool) — this replaces the N individual client PUTs with N server-side parallel writes that share one warm connection pool
- Bulk INSERT objects into
musehub_objects(chunked at 500 rows) - Bulk INSERT snapshot records into
musehub_snapshots - Bulk INSERT commit records into
musehub_commits - Upsert object refs (chunked at 1,000 rows)
- Update branch ref
- Return:
{ok: true, stored_commits: N, stored_objects: M}
Step 5: Remove old presign path (or keep as fallback during transition)
POST /push/presign→POST /push/confirmpath becomes legacy- Can keep alive behind a flag for one release cycle, then delete
Step 6: Remove phase 7 and phase 8 individual MinIO PUTs from push/stream
- Currently
wire_push_streamdoes individualbackend.put()calls for snapshot blobs (phase 7) and commit blobs (phase 8) - These are redundant: commit and snapshot canonical bytes are already in PG
- Remove the MinIO PUTs from phases 7 and 8
push/stream(small push path, <500 objects) becomes: receive frames → write objects to MinIO (already batched at 64) → bulk INSERT commits+snapshots into PG → done
Success criteria (measurable)
| Metric | Before | Target |
|---|---|---|
| MinIO round-trips per push (5,139 objects) | 7,181 | ~5,139 (server-side parallel, warm pool) |
| Wall-clock time, 1,024-commit fresh push | 2m 44s | <15s |
| Phase 7 (snapshot blob PUTs) | 22s | 0s (removed) |
| Phase 8 (commit blob PUTs) | 22s | 0s (removed) |
| Client-side object PUT loop | ~113s (serial) | ~1s (one bundle PUT) |
Files affected
muse/muse/push.py(or equivalent push command) — client sidemusehub/musehub/services/musehub_wire.py— phases 7, 8; new bundle confirm handlermusehub/musehub/storage/backends.py— no changes needed (presign_put already correct)musehub/musehub/views/wire_views.py(or equivalent) — new routes for bundle/presign and bundle/confirmmusehub/tests/— new test for bundle push path
What does NOT change
- Content-addressing: all objects still identified by sha256
- MinIO as the blob store: file objects still live there
- PostgreSQL as the index: commits, snapshots, object refs still indexed there
- The wire frame format (H/O/C/E): used only for the small push path (<500 objects)
- Ed25519 commit signatures: verified server-side during bundle/confirm unpack
- Ghost guard: eliminated by design — server does the unpack, server knows what arrived
Activity1
Implemented — this analysis is now live code
The architectural decision documented here (one MPackBundle replaces N individual object PUTs) was implemented in #45 and is running in production.
Key decisions that landed exactly as specified:
bundle_id = sha256(bundle_bytes)as the content-addressed keybundle.indexjobSecurity layer (issue #49, now closed) built on top of this foundation: size gates, zip-bomb detection, sha256 mismatch quarantine, blocked-hash check, DMCA takedown, daily byte limits.
Closing as implemented.