gabriel / muse public
Closed #9 Enhancement
filed by gabriel human · 31 days ago

Remove repo_id from CommitRecord and resolve_commit_ref — client/server parity

0 Anchors
Blast radius
Churn 30d
0 Proposals

Context

musehub removed repo_id from musehub_commits and musehub_snapshots in May 2026 — commits and snapshots are globally content-addressed objects; per-repo membership is tracked via MusehubCommitRef and MusehubSnapshotRef. The muse CLI still carries repo_id in CommitRecord and passes it as a dead parameter through ~50 commands. This ticket closes that gap.

Key invariants confirmed before planning:

  • compute_commit_id does NOT include repo_id — no existing commit hashes are invalidated
  • resolve_commit_ref takes repo_id but never uses it — it is a dead parameter
  • musehub already does d.get("repo_id", "") on wire commits — handles absence today
  • TagRecord and ReleaseRecord keep their repo_id — tag IDs hash with it and filesystem sharding uses it; those are correctly repo-scoped

Phase 1 — Decouple repo_id from CommitRecord (forward/backward compat)

Scope: muse/core/store.pyCommitRecord, CommitDict, from_msgpack, from_dict, to_dict

  • Give repo_id a default of "" on CommitRecord so it is optional at construction
  • to_dict(): stop emitting repo_id — musehub already handles its absence
  • from_msgpack(): keep reading it (silently discard) so old on-disk files parse without error
  • No on-disk rewrite. No wire breakage.

Phase 2 — Remove repo_id from resolve_commit_ref

Scope: muse/core/store.py::resolve_commit_ref + all ~50 command callers

  • Drop repo_id: str parameter from resolve_commit_ref — unused in the function body
  • Remove repo_id argument from every call site in muse/cli/commands/
  • Commands affected: age, agent_map, annotate, api_surface, apply_patch, archive, bisect, blame, blast_risk, branch, breakage, bundle, cadence, cat, check, checkout, checkout_symbol, cherry_pick, clones, codemap, compare, content_grep, contour, contract, core_blame, core_cat, coupling, and ~25 more

Phase 3 — Remove repo_id from commit creation sites

Scope: muse/cli/commands/commit.py, commit_tree.py, bridge.py, test helpers

  • commit.py: stop reading repo_id = read_repo_id(root) when its only use was CommitRecord(repo_id=repo_id, ...)
  • commit_tree.py: same
  • bridge.py: cr.repo_id used in two places — replace with read_repo_id(root) directly where the git bridge still needs it
  • muse/core/mpack.py: repo_id param on build_mpack is for tag lookup only, not embedded in commits — keep that param, just stop reading it from CommitRecord

Phase 4 — Delete repo_id from CommitRecord and CommitDict entirely

Scope: muse/core/store.py

  • Remove repo_id: str field from CommitRecord dataclass
  • Remove repo_id: str from CommitDict TypedDict
  • Remove repo_id=self.repo_id from to_dict()
  • Keep from_msgpack silently ignoring repo_id if present — this compat shim stays permanently

Gate: Phases 1-3 must be complete and all tests green before this runs.


Phase 5 — Prune orphaned read_repo_id imports

Scope: ~20 command files

  • After Phases 2 and 3, commands whose only use of read_repo_id was for commit resolution no longer need it
  • Commands still using read_repo_id for tags and releases keep it
  • Grep each file post-Phase-3 for remaining repo_id usage to determine which imports to drop

Phase 6 — Docs, tests, wire verification

  • Update docs/reference/type-contracts.md — remove repo_id rows from CommitRecord and CommitDict tables
  • Update docs/commit-id-v2-rewrite.md — remove the Step 2 Load repo_id section
  • Update docs/reference/plumbing.md, cli-tiers.md, remotes.md — remove repo_id from example commit JSON
  • Tests: find all CommitRecord(repo_id=...) constructors and remove the kwarg
  • Wire smoke test: push a commit from patched CLI to staging, confirm musehub accepts it without repo_id in the payload

Execution order

Phases are strict. Phase 1 is the safety net — makes repo_id optional so Phases 2-4 can be done file-by-file without breaking anything mid-flight. Phase 4 (hard delete) only lands after all call sites are clean. Phases 5 and 6 are cleanup.

Activity8
gabriel opened this issue 31 days ago
gabriel 31 days ago

Phase 1 complete ✅

Branch: task/issue-9-phase1-commit-record-repo-id → merged to dev → pushed to local/dev

What shipped

  • CommitRecord.repo_id demoted to optional field (str = "") positioned after committed_at, so old on-disk records that serialised the field deserialise without error
  • CommitDict.repo_id removed — field is no longer emitted in to_dict(), not passed in from_msgpack(), not passed in from_dict()
  • from_msgpack() silently ignores any repo_id present in existing on-disk files (backward-compat read, forward-clean write)
  • Test updated: round-trip assertion on repo_id dropped, constructor call cleaned up
  • 142 tests pass

Up next: Phase 2

Remove repo_id as a dead parameter from resolve_commit_ref and all its call sites.

gabriel 31 days ago

Phase 2 complete ✅

Branch: task/issue-9-phase2-resolve-commit-ref → merged to dev → pushed to local/dev

What shipped

  • resolve_commit_ref(repo_root, repo_id, branch, ref)resolve_commit_ref(repo_root, branch, ref)
  • repo_id was a dead parameter — never read in the function body, only threaded through the recursive tilde-notation call
  • 91 call sites updated across 87 command files, 2 core files (store.py, doc_history.py), 4 test files
  • Caught and fixed an aliased call in narrative.py (_rcr local import also used the old 4-arg form)
  • 1215 tests pass

Up next: Phase 3

Clean all commit-creation sites — stop passing repo_id when constructing CommitRecord objects.

gabriel 31 days ago

Phase 3 complete — merged to dev (sha256:58267095dfd3).

Removed repo_id= from every CommitRecord(...) constructor call across the entire codebase:

  • Source files: muse/cli/commands/bridge.py, muse/cli/commands/rebase.py (plus all commands that pass through write_commit)
  • Test files: ~190 files cleaned
  • Preserved intact: TagRecord, ReleaseRecord, RemoteInfo, CommitDict, _init_repo helpers, compute_release_id calls, TypedDict instantiations — all have their own repo scoping semantics

196 files changed in one atomic commit.

gabriel 31 days ago

Phase 4 complete — merged to dev (sha256:e3248448b953).

Hard-deleted repo_id: str = "" from CommitRecord dataclass. The CommitDict TypedDict was already clean (no repo_id field), to_dict() was already not emitting it, and from_msgpack() already silently ignores it when present in old on-disk files — the compat shim remains permanently by omission.

gabriel 31 days ago

Phase 4 sweep complete — merged to dev (sha256:fded69da598f).

Three additional items caught on re-sweep:

  • store.py module docstring commit schema: removed stale "repo_id" key from the example JSON block
  • _verify_commit_id docstring: removed repo_id from the compute_commit_id coverage list (it was never in the formula)
  • get_commits_for_branch: removed dead repo_id: str parameter that was never read in the function body, plus its docstring entry

Updated 21 call sites across 9 files (log.py, shortlog.py, and 7 test files). 512 tests pass.

gabriel 31 days ago

Phase 5 complete — orphaned read_repo_id imports pruned

Commit: sha256:4d399952496819361713d9828d1978f4978a517d2dfd234d05c7c76c1c5fa707

What was done

Removed all dead read_repo_id import + dead-assignment patterns from ~70 command files that were left behind after Phases 2–4 stripped the legitimate repo_id usages.

DROP files (~70 command files): Removed from muse.core.repo import read_repo_id and the repo_id = read_repo_id(root) assignment line from every file where the result was never consumed downstream.

Dead-param helper functions cleaned (6 files):

  • branch.py_resolve_start_point lost its repo_id: str param
  • checkout.py_checkout_with_merge lost its repo_id: str param
  • bundle.py_resolve_refs lost its repo_id: str param
  • lineage.py_gather_commits lost its repo_id: str param
  • midi_compare.py_load lost its repo_id: str param
  • type_cmd.py_get_manifest lost its repo_id: str param; entire read_repo_id import dropped

TypedDict cleaned: _ReadCommitJson in read_commit.py lost its repo_id: str field.

Test fixtures updated:

  • test_cmd_branch.py — updated _resolve_start_point call sites
  • test_cmd_checkout.py — removed dead assignment
  • test_cmd_plan_merge.py — removed 10 stale patch('...plan_merge.read_repo_id') calls
  • test_cmd_pull_hardening.py — removed 1 stale patch('...pull.read_repo_id') call
  • test_rebase_missing_snapshot_guard.py — removed dead import + assignment
  • test_directories_feature.py — removed dead import + assignment from 3 locations

KEEP files (untouched): bridge.py, describe.py, docs_cmd.py, log.py, release.py, shortlog.py, snapshot_cmd.py, tag.py, verify.py — all legitimately use read_repo_id for tag/release/identity purposes.

Tests passing

  • test_phase5_store_linear_walks.py: 7 passed
  • test_core_store.py + test_core_coverage_gaps.py: 70 passed
  • test_cmd_branch.py + test_cmd_checkout.py: 273 passed
  • test_core_repo.py + test_cmd_tag_hardening.py + test_rebase_missing_snapshot_guard.py + test_directories_feature.py: 303 passed
  • test_cmd_plan_merge.py + test_cmd_pull_hardening.py: 121 passed (8 pre-existing failures in pull tests unrelated to Phase 5)

Next: Phase 6 — update documentation files (type-contracts.md, commit-id-v2-rewrite.md, plumbing.md, cli-tiers.md, remotes.md) to remove all repo_id references from CommitRecord docs.

gabriel 31 days ago

Phase 6 complete — documentation updated

Commit: sha256:9ee4bb8346c8f20e0aade6c5462a69a6328cc2102b1890bd6c09b27f528507b7

Changes

docs/reference/type-contracts.md: Removed repo_id row from the CommitDict field table. The field was removed from CommitRecord and CommitDict in Phase 4; the doc entry was stale.

docs/reference/plumbing.md: Removed repo_id and format_version from the read-commit example JSON output. Dropped the stale sentence referencing format_version in the command description.

docs/reference/cli-tiers.md: Removed repo_id from the read-commit key fields JSON example.

docs/reference/remotes.md: Removed repo_id from the MPack wire format commits array example (CommitRecord fields on the wire no longer include repo_id).

docs/commit-id-v2-rewrite.md: Deleted. This was a planning document tracking the v2 commit ID rewrite. Phase 11 of that document explicitly said 'Delete this doc; commit' — all 11 phases were already marked done.

What was NOT changed

All other repo_id references in docs remain — they describe types that still legitimately carry repo_id:

  • RemoteInfo, WireTag (mpack.py) — remote/wire types that identify the repo
  • TagDict, ReleaseDict (store.py) — tags and releases still use repo_id for content addressing
  • Hub API response types (_HubApiResponse, _RepoJson, etc.) — server-side repo identity
  • Command output types for KEEP files (describe, shortlog, snapshot, verify) — all legitimately call read_repo_id

Issue #9 work is now complete across all 6 phases. All orphaned repo_id usages have been removed from CommitRecord, its TypedDicts, all command files, all test fixtures, and all documentation.

gabriel 31 days ago

All 6 phases verified complete. Reviewed code state in muse/core/store.py:

  • CommitRecord dataclass has no repo_id field (confirmed at line 939)
  • resolve_commit_ref signature is clean (2-arg: repo_root, branch, ref)
  • All CommitRecord(...) construction sites no longer pass repo_id=
  • ~70 command files pruned of dead read_repo_id imports
  • Docs updated: type-contracts.md, plumbing.md, cli-tiers.md, remotes.md; commit-id-v2-rewrite.md deleted
  • Remaining repo_id references in store.py are all TagRecord/ReleaseRecord/path functions — correctly repo-scoped, untouched per spec

Closing.