gabriel / musehub public
Open #60 Enhancement
filed by gabriel human · 18 days ago

snapshot.directories: list[str] → dict[str, str] — make empty dirs fully first-class in the snapshot schema

0 Anchors
Blast radius
Churn 30d
0 Proposals

Background

Empty directories are now content-addressed at the stage level (EMPTY_DIR_OID = blob_id(b'')). However, snapshot.directories is still list[str] — just a list of path strings with no object_id. This is inconsistent with the file manifest which maps path → sha256 object_id.

The goal of this issue is to make the snapshot schema fully symmetric:

# Current
snapshot.manifest    = {"src/main.py": "sha256:abc..."}
snapshot.directories = ["mydir", "emptydir"]

# Target
snapshot.manifest    = {"src/main.py": "sha256:abc..."}
snapshot.directories = {"mydir": "sha256:473a...", "emptydir": "sha256:473a..."}

All empty directories share the same object_id (EMPTY_DIR_OID = sha256:473a0f4c3be8a93681a267e3b1e9a7dcda1185436fe141f7749120a303721813) since they have identical content (zero bytes). This is correct and efficient — one object in the store, referenced by all dir entries.

Why this matters

  • Consistency: snapshot is the unit of content-addressing; a path list with no object_ids is structurally unlike every other part of the system
  • Push/pull completeness: with list[str], push cannot verify dir objects are present on the remote; with dict, dirs participate in the same object-transfer logic as files
  • Future metadata: if dirs ever gain content (permissions, ACLs, .musekeep metadata), the dict schema already supports it — list[str] does not
  • Rust port alignment: the Rust port should not need to special-case directories differently from files in the snapshot layer

Implementation plan (TDD, phased)

Phase 1 — Schema definition and read/write migration

Tests first:

  • snapshot_record_accepts_directories_as_dict
  • read_snapshot_with_list_dirs_migrates_to_dict (backward compat)
  • hash_snapshot_with_dict_dirs_is_stable (deterministic hash)
  • write_then_read_snapshot_preserves_dir_dict

Implementation:

  • Change SnapshotRecord.directories: list[str] → dict[str, str]
  • Update hash_snapshot(manifest, directories) to accept dict
  • Update write_snapshot / read_snapshot: read list format migrates transparently to dict (all values become EMPTY_DIR_OID)
  • Update SnapshotManifest TypedDict to match

Phase 2 — Plugin and stage layer

Tests first:

  • plugin_snapshot_returns_directories_as_dict
  • stage_status_reads_dir_oid_from_snapshot_dict
  • workdir_snapshot_directories_is_dict

Implementation:

  • CodePlugin.snapshot() returns SnapshotManifest with directories as dict
  • workdir_snapshot() same
  • _head_snapshot_dirs_for() returns dict, not list
  • Callers that iterate directories update accordingly (set(dirs) → set(dirs.keys()))

Phase 3 — Diff engine

Tests first:

  • diff_detects_dir_added_from_dict_snapshots
  • diff_detects_dir_removed_from_dict_snapshots
  • diff_detects_dir_renamed_via_content_match (all same oid, match by elimination)

Implementation:

  • plugin.diff() consumes dict directories
  • directory rename detection uses dict keys
  • AddressedInsertOp / DeleteOp / RenameOp emission unchanged

Phase 4 — Push/pull wire format

Tests first:

  • push_includes_empty_dir_object_in_mpack
  • pull_receives_and_stores_empty_dir_object
  • bundle_inspect_shows_dir_objects

Implementation:

  • Push mpack builder: include EMPTY_DIR_OID object when any dir is in snapshot
  • Pull/unbundle: no change needed (object store handles zero-byte blobs naturally)
  • bundle diff/inspect: show dir objects

Phase 5 — CLI surface (status, diff, read)

Tests first:

  • status_json_staged_added_uses_dir_oid_not_sentinel_string
  • read_manifest_includes_directories_dict
  • diff_json_includes_dir_object_id_in_ops

Implementation:

  • status --json: staged.added dir entries show object_id = sha256:473a...
  • read --manifest: directories field is dict not list
  • Any display that showed 'dir:' now shows proper sha256

Phase 6 — Snapshot migration tool

Tests first:

  • migrate_existing_repo_converts_list_dirs_to_dict
  • migration_is_idempotent (run twice = no change)
  • migration_preserves_commit_ids (commits are not rewritten, only snapshots)

Implementation:

  • muse migrate --dirs (or automatic on first access)
  • Walk all snapshots, rewrite those with list-format directories
  • Log count of migrated snapshots

Invariants across all phases

  • EMPTY_DIR_OID = sha256:473a0f4c3be8a93681a267e3b1e9a7dcda1185436fe141f7749120a303721813 (never changes)
  • All existing repos with list-format directories must migrate transparently — no manual user action
  • hash_snapshot output must be stable: same files + dirs → same hash regardless of list vs dict input format
  • All 107 existing directory tests must remain green throughout

Out of scope

  • Non-empty directory objects (tree objects) — future work
  • Changing EMPTY_DIR_OID itself — the zero-byte blob is correct and stable
  • Renaming snapshot fields — keep backward compat naming
Activity
gabriel opened this issue 18 days ago
No activity yet. Use the CLI to comment.