All objects (commits, snapshots, blobs) must be written to the object store — S3 canonical, DB is index
Problem
The object store invariant is broken. Blobs are correctly written to S3 with a DB index row pointing to storage_uri. Commits and snapshots are not — they are DB-only, with no corresponding S3 bytes. This means:
- S3 loss leaves commit history and snapshot manifests intact in DB; DB loss destroys them entirely
muse pullafter a server-side merge fails because the merge commit was never written to S3 andwire_fetch_mpackhas to reconstruct it from DB fields rather than serving canonical bytes- The on-disk format for snapshots in the DB (
manifest_blob: msgpack) diverges from the canonical muse binary format (snapshot <size>\0<json>) used by the local object store and the wire protocol intent - msgpack is used in
manifest_blob/delta_blobDB columns — these are not wire-format contexts and should not use msgpack
Correct invariant
| Type | S3 (source of truth) | DB (queryable index / cache) |
|---|---|---|
| Blob | raw bytes | MusehubObject: id, path, size, storage_uri |
| Snapshot | snapshot <size>\0<json> |
MusehubSnapshot: id, entry_count, dirs, storage_uri, manifest_blob (cache) |
| Commit | commit <size>\0<json> |
MusehubCommit: id, branch, parents, message, storage_uri |
The muse binary format (commit, snapshot, blob type-prefixed) is the canonical format everywhere — local object store, S3, and wire. msgpack is wire-envelope only (the outer mpack framing), not a storage format.
Blast radius
Commit write sites (DB-only today — each needs backend.put(commit_id, commit_bytes))
musehub/services/musehub_proposals.py:753—merge_proposalmerge commitmusehub/services/musehub_repository.py:232— repo init commitmusehub/services/musehub_sync.py:234— sync path commitmusehub/services/musehub_sync.py:527—commit_files_to_repomusehub/services/musehub_wire_push.py:843— push receive bulk insert
Snapshot write sites (DB-only with msgpack today — each needs backend.put(snapshot_id, snapshot_bytes))
musehub/services/musehub_snapshot.py:187—upsert_snapshot_entries(primary path)musehub/services/musehub_wire_push.py:207— push receive mergemusehub/services/musehub_wire_push.py:683— push receive bulk insert
manifest_blob / delta_blob readers (heavy — keep as DB cache, read from S3 on miss)
musehub/services/musehub_snapshot.py— manifest decode helpersmusehub/services/musehub_wire_fetch.py— fetch delta computation (lines 240, 259, 437, 466, 875, 888)musehub/services/musehub_wire_shared.py—_snap_row_to_wireand manifest chain walkmusehub/services/musehub_intel_providers.py— 10+ read sites for code intelligencemusehub/services/musehub_gc.py:151,168— GC manifest walksmusehub/services/musehub_governance.py:105musehub/services/musehub_auth.py:241musehub/services/musehub_orgs.py:259musehub/services/musehub_social.py:153musehub/services/musehub_symbol_indexer.py:1121,1373musehub/api/routes/api/identities.py:293
msgpack packb write sites to eliminate from non-wire paths
musehub/services/musehub_snapshot.py:184,227—upsert_snapshot_entriesand bulk upsertmusehub/services/musehub_wire_push.py:205,628,632— push receive manifest/delta encoding
_snap_row_to_wire call sites (3 — replaced by direct S3 bytes serve)
musehub/services/musehub_wire_fetch.py:283,407,880
DB schema additions needed
MusehubCommit: addstorage_uricolumnMusehubSnapshot: addstorage_uricolumn
Implementation — phased TDD plan
Phase 1 — Failing tests + DB schema
Write failing tests asserting:
- After
merge_proposal,backend.exists(merge_commit_id)is True - After
merge_proposal,backend.exists(merged_snapshot_id)is True - After push receive,
backend.exists(commit_id)is True for every received commit - After push receive,
backend.exists(snapshot_id)is True for every received snapshot - After repo init,
backend.exists(init_commit_id)is True - After
commit_files_to_repo,backend.exists(commit_id)is True - Fetched commit bytes from S3 decode as valid
commit <size>\0<json>(not msgpack) - Fetched snapshot bytes from S3 decode as valid
snapshot <size>\0<json>(not msgpack)
Add Alembic migration: nullable storage_uri on musehub_commits and musehub_snapshots.
All tests must be RED before proceeding.
Phase 2 — Commit write path goes green
Add backend.put(commit_id, commit_bytes) at all 5 commit write sites.
Commit bytes format: commit <size>\0<json> (identical to muse local store).
Populate MusehubCommit.storage_uri from the returned URI.
Phase 1 commit tests go GREEN. Snapshot tests still RED.
Phase 3 — Snapshot write path goes green
Add backend.put(snapshot_id, snapshot_bytes) at all 3 snapshot write sites.
Snapshot bytes format: snapshot <size>\0<json> (identical to muse local store).
Keep manifest_blob column populated from the same decoded dict (DB cache — avoids S3 round-trips for intel/GC hot paths).
Populate MusehubSnapshot.storage_uri from the returned URI.
All Phase 1 tests go GREEN.
Phase 4 — Wire fetch serves canonical bytes
Replace _snap_row_to_wire with direct S3 read of snapshot <size>\0<json> bytes.
Replace commit reconstruction in wire_fetch_mpack with direct S3 read of commit <size>\0<json> bytes.
Fall back to DB reconstruction when storage_uri is null (pre-backfill rows).
Add tests: fetched mpack contains bytes that decode cleanly as the canonical binary format.
Phase 5 — Backfill existing objects
Migration script: for every MusehubCommit and MusehubSnapshot where storage_uri IS NULL, reconstruct canonical bytes from DB fields and write to S3.
Run on staging first. Verify muse pull works end-to-end for existing repos.
Mark backfill complete when zero null storage_uri rows remain.
Phase 6 — Eliminate msgpack from non-wire paths
Remove msgpack from upsert_snapshot_entries and push receive snapshot write paths.
manifest_blob stays as a JSON-encoded DB cache (switch from msgpack bytes to JSON if it simplifies the column, or keep msgpack for compactness with a clear comment that it is a cache, not the source of truth).
Remove delta_blob-based manifest reconstruction from _snap_row_to_wire (now dead code).
Clean up all non-wire msgpack packb/unpackb call sites.
Definition of done
- Every commit and snapshot written anywhere in musehub results in a corresponding S3 object in muse binary format
muse pullfrom staging after a server-side merge commit works end-to-endmuse verifyon a freshly pulled repo passes- S3 is the disaster-recovery source of truth: a fresh DB seeded only from S3 can reconstruct all commit and snapshot metadata
- No msgpack in storage paths — only in wire-framing