Closed #44 Performance

filed by gabriel human · 35 days ago

Bundle-based push: replace N individual object PUTs with one pack upload

0 Anchors

— Blast radius

— Churn 30d

0 Proposals

Problem

Every push today decomposes into N individual HTTP round-trips to the object store:

Presign path: 5,139 individual PUT requests to MinIO/R2 (one per object)
Phase 7: 1,018 individual PUT requests for snapshot blobs
Phase 8: 1,024 individual PUT requests for commit blobs

At ~22ms per MinIO round-trip: 7,181 × 22ms = 158 seconds minimum, just in object storage I/O. This is why a 1,024-commit push takes 2m 44s. Git does the equivalent in one transfer.

Root cause

The push path does not use the bundle format. MPackBundle already exists and already carries commits + snapshots + objects in one self-contained msgpack file. The push path reinvents this, badly, by decomposing the bundle into individual round-trips.

Solution

Wire the push path through the bundle format. One upload, one server-side unpack, one index pass.

Client                                     Server
──────                                     ──────
1. Serialize push delta as bundle       
2. POST /push/bundle/presign            →  issue one presigned PUT URL for bundle
3. PUT bundle.muse → MinIO             →  one HTTP request (all objects+commits+snaps)
4. POST /push/bundle/confirm            →  server reads bundle from MinIO
                                           unpacks objects → indexes into PG
                                           updates branch ref
                                        ←  result frame

Implementation — ordered, atomic steps

Step 1: Client — bundle serialization

muse push already calls muse bundle create logic internally for local transfer
Modify the presign path decision: instead of collecting object IDs and requesting N presigned URLs, serialize the entire push delta as an in-memory MPackBundle
The bundle carries: all file objects, all snapshot manifests, all commits
Output: bundle_bytes: bytes, bundle_id: str (sha256 of bundle bytes)

Step 2: Server — `POST /push/bundle/presign`

Accept: {bundle_id: str, size_bytes: int}
Call backend.presign_put(bundle_id, ttl=3600) — one presigned URL for the whole bundle
Return: {presigned_url: str, bundle_id: str}
No object-by-object existence check. No N presigned URLs. One URL.

Step 3: Client — single PUT

PUT bundle_bytes to the presigned URL
One HTTP request, all data
No loop, no semaphore, no concurrency management needed

Step 4: Server — `POST /push/bundle/confirm`

Accept: {bundle_id: str, branch: str, head: str, force: bool}
Read bundle from MinIO: data = await backend.get(bundle_id)
Call bundle_unpack(data) → yields (object_id, bytes) for each object, plus commit and snapshot records
For each object: write to MinIO by object_id (parallel, using S3 thread pool) — this replaces the N individual client PUTs with N server-side parallel writes that share one warm connection pool
Bulk INSERT objects into musehub_objects (chunked at 500 rows)
Bulk INSERT snapshot records into musehub_snapshots
Bulk INSERT commit records into musehub_commits
Upsert object refs (chunked at 1,000 rows)
Update branch ref
Return: {ok: true, stored_commits: N, stored_objects: M}

Step 5: Remove old presign path (or keep as fallback during transition)

POST /push/presign → POST /push/confirm path becomes legacy
Can keep alive behind a flag for one release cycle, then delete

Step 6: Remove phase 7 and phase 8 individual MinIO PUTs from `push/stream`

Currently wire_push_stream does individual backend.put() calls for snapshot blobs (phase 7) and commit blobs (phase 8)
These are redundant: commit and snapshot canonical bytes are already in PG
Remove the MinIO PUTs from phases 7 and 8
push/stream (small push path, <500 objects) becomes: receive frames → write objects to MinIO (already batched at 64) → bulk INSERT commits+snapshots into PG → done

Success criteria (measurable)

Metric	Before	Target
MinIO round-trips per push (5,139 objects)	7,181	~5,139 (server-side parallel, warm pool)
Wall-clock time, 1,024-commit fresh push	2m 44s	<15s
Phase 7 (snapshot blob PUTs)	22s	0s (removed)
Phase 8 (commit blob PUTs)	22s	0s (removed)
Client-side object PUT loop	~113s (serial)	~1s (one bundle PUT)

Files affected

muse/muse/push.py (or equivalent push command) — client side
musehub/musehub/services/musehub_wire.py — phases 7, 8; new bundle confirm handler
musehub/musehub/storage/backends.py — no changes needed (presign_put already correct)
musehub/musehub/views/wire_views.py (or equivalent) — new routes for bundle/presign and bundle/confirm
musehub/tests/ — new test for bundle push path

What does NOT change

Content-addressing: all objects still identified by sha256
MinIO as the blob store: file objects still live there
PostgreSQL as the index: commits, snapshots, object refs still indexed there
The wire frame format (H/O/C/E): used only for the small push path (<500 objects)
Ed25519 commit signatures: verified server-side during bundle/confirm unpack
Ghost guard: eliminated by design — server does the unpack, server knows what arrived

◎ Activity1

●

gabriel opened this issue 35 days ago

○

gabriel 35 days ago

Implemented — this analysis is now live code

The architectural decision documented here (one MPackBundle replaces N individual object PUTs) was implemented in #45 and is running in production.

Key decisions that landed exactly as specified:

bundle_id = sha256(bundle_bytes) as the content-addressed key
Single presigned PUT URL for the whole bundle — one MinIO round-trip replaces 7,181
Server sync path: verify sha256, advance branch pointer, enqueue bundle.index job
Background job: decompress objects, write to MinIO, bulk INSERT into PG
Ghost guard deleted — the hash is the proof

Security layer (issue #49, now closed) built on top of this foundation: size gates, zip-bomb detection, sha256 mismatch quarantine, blocked-hash check, DMCA takedown, daily byte limits.

Closing as implemented.

Assignee

gabriel human

Release

no commits linked to this issue

create

muse hub issue create \
  --title "..." \
  --body "..." \
  --label bug \
  --anchor path/to/file.py::Symbol \
  --commit-anchor <sha> \
  --repo gabriel/musehub

read

muse hub issue get 44 --json
muse hub issue list --state open --json

update

muse hub issue edit 44 \
  --anchor path/to/file.py::Symbol \
  --repo gabriel/musehub

comment

muse hub issue comment 44 \
  --body "Fixed in <sha>" \
  --repo gabriel/musehub

reopen

muse hub issue reopen 44 \
  --repo gabriel/musehub

create

create_issue({
  repo_id: "sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75",
  title: "...",
  body: "...",
  labels: ["bug"],
  symbol_anchors: [
    "path/to/file.py::Symbol"
  ],
  commit_anchors: ["<sha>"]
})

read

get_issue({
  repo_id: "sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75",
  issue_number: 44
})

list_issues({
  repo_id: "sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75",
  state: "open"
})

update

edit_issue({
  repo_id: "sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75",
  issue_number: 44,
  symbol_anchors: ["path/to/file.py::Symbol"]
})

comment

create_issue_comment({
  repo_id: "sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75",
  issue_number: 44,
  body: "..."
})

reopen

reopen_issue({
  repo_id: "sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75",
  issue_number: 44
})

create

curl -X POST \
  http://localhost:10003/api/repos/sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75/issues \
  -H "Content-Type: application/json" \
  -H "Authorization: MSign handle=\"...\" ts=... sig=\"...\"" \
  -d '{
    "title": "...",
    "body": "...",
    "labels": ["bug"],
    "symbol_anchors": ["path/to/file.py::Symbol"]
  }'

read

# get one issue
curl http://localhost:10003/api/repos/sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75/issues/44

# list open issues
curl "http://localhost:10003/api/repos/sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75/issues?state=open"

update

curl -X PATCH \
  http://localhost:10003/api/repos/sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75/issues/44 \
  -H "Content-Type: application/json" \
  -H "Authorization: MSign handle=\"...\" ts=... sig=\"...\"" \
  -d '{"title": "...", "body": "..."}'

comment

curl -X POST \
  http://localhost:10003/api/repos/sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75/issues/44/comments \
  -H "Content-Type: application/json" \
  -H "Authorization: MSign handle=\"...\" ts=... sig=\"...\"" \
  -d '{"body": "Fixed in <sha>"}'

reopen

curl -X POST \
  http://localhost:10003/api/repos/sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75/issues/44/reopen \
  -H "Authorization: MSign handle=\"...\" ts=... sig=\"...\""

Bundle-based push: replace N individual object PUTs with one pack upload

Problem

Root cause

Solution

Implementation — ordered, atomic steps

Step 1: Client — bundle serialization

Step 2: Server — POST /push/bundle/presign

Step 3: Client — single PUT

Step 4: Server — POST /push/bundle/confirm

Step 5: Remove old presign path (or keep as fallback during transition)

Step 6: Remove phase 7 and phase 8 individual MinIO PUTs from push/stream

Success criteria (measurable)

Files affected

What does NOT change

Implemented — this analysis is now live code

Step 2: Server — `POST /push/bundle/presign`

Step 4: Server — `POST /push/bundle/confirm`

Step 6: Remove phase 7 and phase 8 individual MinIO PUTs from `push/stream`