gabriel / musehub public
Closed #58
filed by gabriel human · 26 days ago

Unify commits, snapshots, and blobs into a single object store

0 Anchors
Blast radius
Churn 30d
0 Proposals

Problem

.muse has three separate content-addressed stores:

.muse/commits/sha256/<id>.msgpack
.muse/snapshots/sha256/<id>.msgpack
.muse/objects/sha256/<prefix>/<id>

This is a historical accident. Commits and snapshots are already sha256 content-addressed, just like blobs. There is no principled reason they live in separate directories with a separate format.

Git does it right: one objects/ store, everything in it, type encoded in the object itself, 2-hex-char path prefix.

Target state

.muse/objects/sha256/
  01/c8de44...   ← blob
  4d/8b40ce...   ← snapshot
  6a/9cf0e8...   ← blob
  6f/a10416...   ← blob
  8d/bb26b7...   ← commit

One store. Same lookup path for everything. Type is encoded in the object data, not the directory.

Approach

TDD. Write the tests first:

  1. Write a commit object → read it back by ID from the unified store
  2. Write a snapshot object → read it back by ID from the unified store
  3. Write a blob → read it back by ID (unchanged behavior)
  4. Mixed read: given only an ID, resolve the correct object regardless of type
  5. Migration: existing repo with separate commits/ and snapshots/ dirs → migrated to unified store, all IDs resolve correctly

Tests must pass before any implementation changes to the store layout.

Notes

  • The format question (msgpack vs custom binary vs JSON) is separate and can be decided during implementation
  • The wire format is unaffected — this is on-disk layout only
  • .muse/commits/ and .muse/snapshots/ directories should be removed once migration is complete
Activity4
gabriel opened this issue 26 days ago
gabriel 26 days ago

Unified object format

Every object in .muse/objects/sha256/<prefix>/<id> uses this layout:

byte 0:   type     — 0x01=commit  0x02=snapshot  0x03=blob
byte 1:   version  — 0x01 (v1 of this unified format)
byte 2+:  payload  — msgpack-serialized body

Type registry

byte type payload
0x01 commit CommitRecord (msgpack)
0x02 snapshot SnapshotRecord (msgpack)
0x03 blob raw bytes

Rules

  • Reader checks byte 0 to know what it is, byte 1 to know how to parse it
  • Unknown type byte → reject, do not parse
  • Unknown version byte → reject, do not parse
  • Payload format may change per version without breaking old readers
  • The object ID (sha256) is computed over the full bytes including the 2-byte header
gabriel 25 days ago

Progress update

Done

  • muse.core.ids created as the canonical home for all ID derivation
  • hash_blob, hash_snapshot, hash_commit implemented in ids.py with canonical hash_* naming convention
  • DEFAULT_HASH_ALGO and long_id moved into ids.py (self-contained, no imports from types.py)
  • Deprecation comments added to blob_id in types.py and compute_commit_id / compute_snapshot_id in snapshot.py
  • tests/test_unified_object_store.py created with 3 passing tests:
    • test_write_read_blob — raw bytes stored and retrieved correctly
    • test_write_read_snapshot — snapshot JSON stored with type+version header, read back correctly
    • test_write_read_commit — full CommitRecord field shape stored as JSON, read back correctly
  • All tests use object_path and objects_dir from muse.core.object_store — no hardcoded path strings
  • Fixture chain: blob_idsnapshot_idcommit_id, snapshot_data, commit_data

Remaining

  • store.py: update write_commit and write_snapshot to write to objects/sha256/ with type+version header
  • store.py: update read_commit and read_snapshot to read from objects/sha256/ with legacy fallback
  • gc.py: update reachability walks to scan objects/sha256/ for commits and snapshots
  • Remove .muse/commits/ and .muse/snapshots/ once all repos have migrated
gabriel 25 days ago

Update — Test 4 complete

All 4 tests passing:

  • test_write_read_blob
  • test_write_read_snapshot
  • test_write_read_commit
  • test_mixed_read_resolves_type ✅ — all three objects written to the same store; given only an ID, the type byte correctly identifies each one without any prior knowledge of what was stored

Remaining tests (per issue spec)

  • Test 5: migration — existing repo with separate commits/ and snapshots/ dirs resolves all IDs correctly after migration
  • Test 6: missing object raises FileNotFoundError
  • Test 7: on-disk path layout

Remaining implementation

  • store.py: write_commit / write_snapshot write to objects/sha256/ with type+version header
  • store.py: read_commit / read_snapshot read from objects/sha256/ with legacy fallback
  • gc.py: reachability walks scan objects/sha256/ for commits and snapshots
  • Remove .muse/commits/ and .muse/snapshots/ once all repos migrated
gabriel 20 days ago

All deliverables complete and verified:

  • Unified object store implemented in muse/core/object_store.py — one store, Git-style <type> <len>\0<payload> format
  • commits.py::write_commit and snapshots.py::write_snapshot both write to .muse/objects/sha256/
  • commit_path and snapshot_path both route to the unified store via object_path()
  • gc.py reachability walk covers all three types (blob, commit, snapshot) in the unified store
  • Migration tooling in muse/core/migrate.py handles repos with legacy .muse/commits/ and .muse/snapshots/ dirs
  • 23 tests passing: test_unified_object_store.py (13) + test_migrate_object_store.py (10)
  • Dead imports of snapshots_dir/commits_dir removed from gc.py, snapshot_cmd.py, snapshot_diff.py, verify.py
  • All docstrings, CLI help text, and docs updated to reference .muse/objects/sha256/
  • Stale test names renamed to reflect unified store

Closing.