Replace 2-byte header with Git-idiomatic type prefix in unified object store
Problem
The unified object store (#58) was implemented with a 2-byte binary header [type_byte, version_byte] prepended to object payloads. This is not how Git works.
Git bakes the type into the content before hashing and storing:
store: "<type> <size>\0<payload>"
hash: sha256("<type> <size>\0<payload>")
on disk: objects/<2hex>/<62hex> ← same path layout, already correct
The type is recoverable by reading the object — no separate header needed. The object ID includes the type, so a blob and a commit with identical payload bytes will never collide.
What needs to change
Phase 1 — Replace the format primitives
- Remove
OBJECT_TYPE_COMMIT,OBJECT_TYPE_SNAPSHOT,OBJECT_TYPE_BLOB,OBJECT_FORMAT_V1constants - Remove
write_typed_object/read_typed_objectfromobject_store.py - Add
write_git_object(repo, type_str, payload) -> str— prepends"<type> <size>\0", hashes the full string, writes toobjects/<algo>/<2>/<62>, returns the object ID - Add
read_git_object(repo, object_id) -> tuple[str, bytes] | None— reads the file, parses up to\0to extract type and size, returns(type_str, payload)
Phase 2 — Update ids.py
hash_blob(data)→sha256("blob <size>\0" + data)hash_snapshot(manifest)→sha256("snapshot <size>\0" + canonical_bytes)hash_commit(...)→sha256("commit <size>\0" + canonical_bytes)
All existing IDs change. Migration of old objects is a separate pass (already planned).
Phase 3 — Wire write_commit / read_commit
write_commitdual-writes: msgpack to.muse/commits/(old), git-object to.muse/objects/(new)read_commitfalls back to.muse/objects/viaread_git_objectwhen old path absent
Phase 4 — Wire write_snapshot / read_snapshot
Same pattern as Phase 3.
Phase 5 — Blobs
write_objectuseswrite_git_objectwithtype_str="blob"read_objectusesread_git_objectand strips the header
Phase 6 — Tests
Rewrite tests/test_unified_object_store.py to use the new primitives throughout. Remove all references to the 2-byte header constants.
Phase 7 — Migration
Backfill all existing .muse/commits/ and .muse/snapshots/ msgpack objects into .muse/objects/ in the git-idiomatic format. Delete old directories once migration is verified.
Acceptance criteria
- All objects (blobs, snapshots, commits) live in
.muse/objects/sha256/<2>/<62> - On-disk format is
"<type> <size>\0<payload>"— no binary header - Object ID is
sha256("<type> <size>\0<payload>")for all types - Reading any object by ID yields its type without any separate metadata
- All 8 tests in
test_unified_object_store.pypass against the new format - No references to
OBJECT_TYPE_*orOBJECT_FORMAT_V1remain in the codebase
All four passes complete and committed as sha256:5b3273e7.
- Pass 1: deleted
compute_commit_id,compute_snapshot_id,snapshot_identity_bytes,commit_identity_bytesfromsnapshot.py; all 15 call sites updated to importhash_commit/hash_snapshotdirectly fromids.py - Pass 2: flipped read priority in
read_commit/read_snapshot— object store first, msgpack fallback second - Pass 3:
write_commitandwrite_snapshotwrite exclusively to the object store;commit_existschecks the object store path - Pass 4: removed msgpack fallback read branches; deleted 3,308 legacy msgpack files from
.muse/commits/and.muse/snapshots/
The unified object store is now the sole storage layer for all commits, snapshots, and blobs in the muse repo.
Remaining cleanup — load-bearing order
Migration ran successfully (9,706 blobs + 1,084 snapshots + 1,111 commits moved to unified object store). All seven phases are implemented. What remains is four shim-removal passes:
Pass 1 — delete
compute_commit_id/compute_snapshot_idshimsFile:
muse/core/snapshot.pylines 339–511Both are already one-liner wrappers that delegate to
hash_commit/hash_snapshotinids.py. Thecompute_commit_iddocstring even says Deprecated: use muse.core.ids.hash_commit instead. There are ~15 call sites across 10 files (commit.py,merge.py,rebase.py,cherry_pick.py,pull.py,shelf.py,revert.py,bridge.py,commit_tree.py,snapshot_cmd.py,merge_tree.py,mpack.py,core/rebase.py). Each just needs its import changed fromsnapshottoids. Zero behavior change, zero risk.Pass 2 — flip read priority in
read_commit/read_snapshotFile:
muse/core/store.pyCurrently both functions check the legacy msgpack path first and fall back to the object store only when the file is absent. That priority should be reversed: object store first, msgpack as legacy fallback. This makes the object store the canonical read source. No data loss risk — both paths hold the same records.
Pass 3 — stop dual-writing msgpack in
write_commit/write_snapshotFile:
muse/core/store.pylines 1632–1636 and 2210–2214After pass 2 flips read priority, new commits and snapshots no longer need to land in
.muse/commits/and.muse/snapshots/. Remove the_write_msgpack_atomiccall from each writer; keep only the object store write. Pass 2 and pass 3 can land in the same commit.Pass 4 — delete legacy msgpack dirs and remove fallback reads
Files:
muse/core/store.py+ all repos on diskOnce we have run on object-store-only writes for a while and are confident, remove the fallback read branches in
read_commit(lines 1736–1746) andread_snapshot(lines 2233–2244), then delete.muse/commits/and.muse/snapshots/from every repo. At that point the unified object store is the only storage layer — the ticket is fully closed.