Replace 2-byte header with Git-idiomatic type prefix in unified object store
Problem
The unified object store (#58) was implemented with a 2-byte binary header [type_byte, version_byte] prepended to object payloads. This is not how Git works.
Git bakes the type into the content before hashing and storing:
store: "<type> <size>\0<payload>"
hash: sha256("<type> <size>\0<payload>")
on disk: objects/<2hex>/<62hex> ← same path layout, already correct
The type is recoverable by reading the object — no separate header needed. The object ID includes the type, so a blob and a commit with identical payload bytes will never collide.
What needs to change
Phase 1 — Replace the format primitives
- Remove
OBJECT_TYPE_COMMIT,OBJECT_TYPE_SNAPSHOT,OBJECT_TYPE_BLOB,OBJECT_FORMAT_V1constants - Remove
write_typed_object/read_typed_objectfromobject_store.py - Add
write_git_object(repo, type_str, payload) -> str— prepends"<type> <size>\0", hashes the full string, writes toobjects/<algo>/<2>/<62>, returns the object ID - Add
read_git_object(repo, object_id) -> tuple[str, bytes] | None— reads the file, parses up to\0to extract type and size, returns(type_str, payload)
Phase 2 — Update ids.py
hash_blob(data)→sha256("blob <size>\0" + data)hash_snapshot(manifest)→sha256("snapshot <size>\0" + canonical_bytes)hash_commit(...)→sha256("commit <size>\0" + canonical_bytes)
All existing IDs change. Migration of old objects is a separate pass (already planned).
Phase 3 — Wire write_commit / read_commit
write_commitdual-writes: msgpack to.muse/commits/(old), git-object to.muse/objects/(new)read_commitfalls back to.muse/objects/viaread_git_objectwhen old path absent
Phase 4 — Wire write_snapshot / read_snapshot
Same pattern as Phase 3.
Phase 5 — Blobs
write_objectuseswrite_git_objectwithtype_str="blob"read_objectusesread_git_objectand strips the header
Phase 6 — Tests
Rewrite tests/test_unified_object_store.py to use the new primitives throughout. Remove all references to the 2-byte header constants.
Phase 7 — Migration
Backfill all existing .muse/commits/ and .muse/snapshots/ msgpack objects into .muse/objects/ in the git-idiomatic format. Delete old directories once migration is verified.
Acceptance criteria
- All objects (blobs, snapshots, commits) live in
.muse/objects/sha256/<2>/<62> - On-disk format is
"<type> <size>\0<payload>"— no binary header - Object ID is
sha256("<type> <size>\0<payload>")for all types - Reading any object by ID yields its type without any separate metadata
- All 8 tests in
test_unified_object_store.pypass against the new format - No references to
OBJECT_TYPE_*orOBJECT_FORMAT_V1remain in the codebase