gabriel / musehub public
Closed #86 Enhancement merge
filed by gabriel human · 8 days ago

Comprehensive merge matrix: strategy × history × granularity parity between muse merge and proposal merge

0 Anchors
Blast radius
Churn 30d
0 Proposals

Background

Muse tracks merges at three granularities git cannot: directory, file, and symbol. It has Harmony (conflict-resolution memory) instead of rerere. In theory this gives Muse a dramatically smoother dev UX than git. In practice we are introducing phantom conflicts and regressions during merges — which means the merge layer is not fully dialed in yet.

The proximate trigger was issue #85: `muse hub proposal merge --strategy overlay` reporting phantom conflicts on files with identical object IDs on both sides. The deeper problem is that:

  1. `muse merge` (local) and `muse hub proposal merge` (server) expose different strategy and history options — the server has capabilities the local command lacks.
  2. The full strategy × history × granularity matrix has never been systematically tested.
  3. Harmony (conflict-resolution memory) interacts differently with each combination and those interactions are undocumented and untested.

This ticket makes Muse's merge layer genuinely class-apart from git.


Goal

  • Parity: every strategy and history mode available in `muse hub proposal merge` must also be available in `muse merge` locally. A proposal merge is a server-side automated `muse merge` — they must be identical in behavior.
  • Correctness: every cell in the strategy × history × granularity matrix must have a passing integration test.
  • Harmony: every merge path must correctly teach Harmony what it learned, or explicitly document why it cannot (e.g. squash destroys per-commit provenance).
  • No phantom conflicts: convergent edits and untouched files must never produce conflicts under any strategy or history mode.

Phase 1 — Canonical strategy vocabulary

Conceptual model

Every merge strategy answers exactly two questions:

1. What unit do you diff at? (diff_unit)

three_way:  combine(delta(base→ours), delta(base→theirs))  # uses the common ancestor
snapshot:   compare(ours_state, theirs_state)               # no base, final states only
replay:     apply(delta(base→ours), onto=theirs_state)      # one side's delta onto the other's state
            apply(delta(base→theirs), onto=ours_state)      # (direction is a flag)

2. When two units conflict, what do you do? (resolution_policy)

escalate:     surface the conflict — human or Harmony resolves
prefer_ours:  auto-resolve by keeping ours
prefer_theirs: auto-resolve by keeping theirs

These two axes produce every strategy. Named strategies are aliases for specific combinations — not independent code paths.

Internal architecture

class MergeEngine:
    diff_unit:  Literal["three_way", "snapshot", "replay_ours", "replay_theirs"]
    resolution: Literal["escalate", "prefer_ours", "prefer_theirs"]

Strategy table

Strategy diff_unit resolution Notes
`recursive` three_way escalate Default for `muse merge`
`overlay` snapshot prefer_theirs Default for proposals
`snapshot` snapshot escalate Snapshot-level with conflict surfacing
`replay` replay_ours escalate Ours' delta applied onto theirs' state

Convenience aliases (decompose internally to the above):

Alias Expands to
`--strategy ours` `--strategy recursive --on-conflict ours`
`--strategy theirs` `--strategy recursive --on-conflict theirs`

CLI design — two flags, both optional

`--strategy` answers "how do I compare the two sides?" Defaults to `recursive`.

`--on-conflict` answers "what do I do when they disagree?" Defaults to `escalate`.

Both flags are optional, both default to the right thing, and they compose freely:

# Normal merge — no flags needed
muse merge feat/x

# Auto-resolve all conflicts keeping ours
muse merge feat/x --on-conflict ours

# Overlay, surface conflicts
muse merge feat/x --strategy overlay

# Overlay, auto-resolve keeping theirs
muse merge feat/x --strategy overlay --on-conflict theirs

# Convenience alias (muscle-memory friendly)
muse merge feat/x --strategy ours

Why three_way + domain plugin = automatic 3-level granularity

The three levels of conflict granularity (directory, file, symbol) are not separate strategies. They fall out naturally from `three_way` when the domain plugin knows how to decompose a file into diffable units:

  • Symbol-level: domain plugin decomposes file → symbols; delta computed per symbol. Conflict only where the same symbol diverged. Untouched symbols auto-merge.
  • File-level: fallback when no domain plugin; delta computed on whole file bytes.
  • Directory-level: emergent — a directory conflict is just "all files under this prefix conflicted."

A file that looks like a conflict at the file level is often zero conflicts at the symbol level. This is Muse's core advantage over git. It is delivered by `three_way` + an active domain plugin — not by a separate strategy.

Deliverables

  • [✅] `muse merge` gains `--on-conflict escalate|ours|theirs` flag.
  • [✅] `muse merge` gains `--history merge|squash|rebase` flag (today it has no `--history` flag).
  • [✅] `muse merge` `--strategy` choices: `recursive`, `overlay`, `snapshot`, `replay`, `ours` (alias), `theirs` (alias).
  • [✅] One `MergeEngine` class shared by `muse merge` (local) and proposal merge (server).
  • [✅] JSON output shape identical for both paths.

Phase 2 — Conflict granularity specification

Muse detects conflicts at three levels. Each must be exercised by the test matrix.

Level What it covers Example
Directory A directory deleted on one side, files added inside it on the other `src/` deleted vs `src/new_module.py` added
File Same file modified on both sides, object IDs differ `config.py` modified differently on A and B
Symbol Same symbol modified on both sides within a file `config.py::MAX_CONNECTIONS` changed on both

Convergent-edit sub-cases (must never produce a conflict)

Sub-case Description
Untouched file File unchanged from base on both sides — object IDs identical
Convergent edit Both branches independently arrive at the same content (same object ID)
Convergent symbol Both branches modify the same symbol to the same `new_content_id`
Both deleted File deleted on both sides — clean, no conflict

Divergent-edit sub-cases (must produce a conflict)

Sub-case Description
File divergence Same file, different content, no symbol info
Symbol divergence Same symbol address, different `new_content_id`
Add/add collision Both sides add a file at the same path with different content
Delete/modify One side deletes a file, the other modifies it

Deliverables

One new test file: `tests/test_phase2_conflict_granularity.py`.

Convergent sub-cases — all must produce `conflicts == []` and a clean status:

  • [✅] CE_01 — Untouched file: both branches leave a file's object ID unchanged from base → no conflict detected.
  • [✅] CE_02 — Convergent edit: both branches independently write the same bytes to a file → no conflict, file present in merged snapshot.
  • [✅] CE_03 — Convergent symbol: both branches update the same Python symbol to identical source → no conflict detected (requires code plugin).
  • [✅] CE_04 — Both deleted: both branches delete the same file from base → no conflict, file absent from merged snapshot.

Divergent sub-cases — all must produce the conflicting path in `conflicts`:

  • [✅] DE_01 — File divergence: same path modified to different bytes on each side → path appears in `conflicts`.
  • [✅] DE_02 — Symbol divergence: same symbol modified to different source on each side → conflict detected (requires code plugin).
  • [✅] DE_03 — Add/add collision: both sides add the same path with different content → path appears in `conflicts`.
  • [✅] DE_04 — Delete/modify: one side removes a file, the other modifies it → path appears in `conflicts`.

Directory-level:

  • [✅] DIR_01 — Directory conflict: all files under `src/` deleted on one side; a new file added under `src/` on the other → conflict detected for the new file path.

[✅] All tests green with no regressions to PHANTOM_01–05 or Phase 1 tests.


Phase 3 — Full test matrix

Every cell is one integration test. Tests run identically whether executed via `muse merge` locally or simulated as a proposal merge.

Strategy resolution cheat sheet

Strategy diff_unit resolution Divergent conflict behaviour
`recursive` three_way escalate surfaces conflict
`overlay` snapshot prefer_theirs auto-resolves — theirs always wins, no conflict
`snapshot` snapshot escalate surfaces conflict
`replay` replay_ours escalate surfaces conflict

Minimum required test assertions

Each test must assert:

  1. Correct `conflicts` list (empty for convergent/overlay; populated for escalating divergent)
  2. Correct merged snapshot content (non-conflicting changes from both branches present)
  3. Correct commit graph shape (one parent for squash, two for merge, linear for rebase)

Deliverables

One new test file: `tests/test_phase3_strategy_matrix.py`.

Group 1 — Convergent cases under every strategy (all must produce `conflicts == []`):

  • [✅] SM_01 — `recursive` + untouched file → no conflict, untouched file in merged snapshot.
  • [✅] SM_02 — `overlay` + untouched file → no conflict.
  • [✅] SM_03 — `snapshot` + untouched file → no conflict.
  • [✅] SM_04 — `replay` + untouched file → no conflict.
  • [✅] SM_05 — `recursive` + convergent edit → no conflict, convergent content in merged snapshot.
  • [✅] SM_06 — `overlay` + convergent edit → no conflict, convergent content in merged snapshot.
  • [✅] SM_07 — `snapshot` + convergent edit → no conflict, convergent content in merged snapshot.
  • [✅] SM_08 — `replay` + convergent edit → no conflict, convergent content in merged snapshot.

Group 2 — Divergent cases: conflict surfacing vs auto-resolution:

  • [✅] SM_09 — `recursive` + file divergence → `conflicts` non-empty.
  • [✅] SM_10 — `overlay` + file divergence → `conflicts == []`, theirs wins in merged snapshot.
  • [✅] SM_11 — `snapshot` + file divergence → `conflicts` non-empty.
  • [✅] SM_12 — `replay` + file divergence → `conflicts` non-empty.
  • [✅] SM_13 — `recursive` + add/add collision → `conflicts` non-empty.
  • [✅] SM_14 — `overlay` + add/add collision → `conflicts == []`, theirs content wins.
  • [✅] SM_15 — `snapshot` + add/add collision → `conflicts` non-empty.
  • [✅] SM_16 — `recursive` + delete/modify → `conflicts` non-empty.
  • [✅] SM_17 — `overlay` + delete/modify → `conflicts == []`, modified version survives.
  • [✅] SM_18 — `snapshot` + delete/modify → `conflicts` non-empty.

Group 3 — History mode produces correct commit graph shape:

  • [✅] SM_19 — `--history merge` after a clean merge → commit has two parents (`parent2_commit_id` set).
  • [✅] SM_20 — `--history squash` after a clean merge → commit has one parent (`parent2_commit_id` is None).
  • [✅] SM_21 — `--history rebase` after a clean merge → commits are linear; no merge commit created.

Group 4 — PHANTOM regressions confirmed under all 4 strategies:

  • [✅] SM_22 — PHANTOM_01 (untouched file) passes under `recursive`, `overlay`, `snapshot`, `replay`.
  • [✅] SM_23 — PHANTOM_02 (convergent edit) passes under all 4 strategies; merged snapshot contains the convergent content.
  • [✅] SM_24 — PHANTOM_05 (clean merge snapshot has non-conflicting changes from both branches) under all 4 strategies.

All 24 tests green with no regressions to Phase 1, Phase 2, or PHANTOM_01–05.

Named regression tests (carry forward from #85)

  • PHANTOM_01: untouched file, strategy=overlay → no conflict
  • PHANTOM_02: convergent edit, strategy=overlay → no conflict, correct snapshot
  • PHANTOM_03: untouched file, strategy=snapshot → no conflict
  • PHANTOM_04: real divergence, strategy=snapshot → conflict correctly detected
  • PHANTOM_05: clean merge → snapshot has changes from both branches

Phase 4 — Harmony integration per history mode

Harmony learns conflict resolutions so future identical conflicts auto-resolve. Each history mode affects what Harmony can learn.

History mode Harmony behavior Issue
`merge` Full learning: per-file, per-symbol resolutions recorded against ours/theirs commit IDs None — ideal case
`squash` Partial learning: the squash commit has no `parent2_commit_id` — Harmony cannot key resolutions to the incoming branch's commit ID. It can still record file-level outcome blobs. Must be explicitly handled: record resolutions against the squash snapshot, not a commit ID
`rebase` No learning possible by default: rebased commits have new IDs; the original conflict context is destroyed. Harmony must be called during the rebase loop, once per replayed commit, not at the final result. Requires Harmony to be called inside the rebase engine, not just at commit time

Deliverables

  • [✅] `muse commit` Harmony recording path handles the squash case correctly (no `parent2_commit_id` present — fall back to snapshot-keyed recording). (HA_04)
  • `muse rebase` calls Harmony inside the replay loop. Deferred — `--history rebase` runs squash-equivalent in Phase 3; per-commit Harmony during replay not yet implemented.
  • [✅] Integration test: after a `--history squash` merge that had a conflict, re-running the same merge with the same conflict auto-resolves via Harmony. (HA_02)
  • [✅] Integration test: after a `--history rebase` merge with a conflict, the same conflict on a subsequent rebase auto-resolves. (HA_03 — squash-equivalent path; passes)

Phase 5 — Phantom conflict guard (universal)

The fix from #85 (`ops_commute` convergent-edit check) must hold under every strategy. Add a universal pre-conflict guard:

# Before recording any conflict for a path:
if ours_object_id == theirs_object_id:
    continue  # identical content — no conflict possible under any strategy

This guard must be applied:

  • In `CodePlugin.merge()` (file-level)
  • In `CodePlugin.merge_ops()` Step 1 (symbol-level, shared patch files)
  • In the server-side proposal merge path
  • Under ALL strategies — `ours_object_id == theirs_object_id` is a physical fact, not a strategy decision

Deliverables

  • [✅] Guard present and tested in all four locations above. tests/test_phase5_phantom_guard.py — 15 tests (PG_01–08).
  • [✅] PHANTOM_01–05 pass under every strategy in the matrix. (SM_22 covers PHANTOM_01/03 under all 4; SM_23 covers PHANTOM_02; SM_24 covers PHANTOM_05; SM_09/11/12 cover PHANTOM_04 under escalating strategies; SM_10 confirms overlay correctly auto-resolves — not a phantom. All 5 original tests green.)

Phase 6 — muse merge and proposal merge use one code path

Today the server-side proposal merge logic lives in musehub/services/proposal_merge_strategies.py and the local merge logic lives in muse/cli/commands/merge.py. They can drift.

  • [✅] Extract run_merge() (see Phase 1 architecture) in muse/core/merge_engine.py — pure function shared by both paths; handles all four diff_unit values and all three resolution policies.
  • [✅] run_merge() accepts base/ours/theirs manifests + MergeEngine + optional domain plugin and returns muse.domain.MergeResult. CLI and server each handle their own I/O wrapping.
  • [✅] muse/cli/commands/merge.py Cases 3 and 4 now delegate to run_merge(). musehub/services/proposal_merge_strategies.py canonical strategy routing delegates to run_merge() + _to_local_result() conversion. Also added ours/theirs to STRATEGY_MAP (were missing, causing reported_strategy to show recursive instead of ours).
  • [✅] tests/test_phase6_unified_merge_engine.py — 12 tests (ME_01–ME_12) covering all diff_units, resolution policies, clean merges, and musehub canonical routes (overlay, recursive, replay).

Phase 7 — Step 1.5 independence merge correctness + Harmony confidence gating

Two correctness gaps identified during Phase 3/4 implementation that are not covered by Phases 5 or 6.

Gap 1 — Step 1.5 silent union-merge on unparseable files

What happens today: In CodePlugin.merge_ops, Step 1.5 checks whether all symbol-level changes commute before attempting _independence_merge_blob. The check is:

file_has_symbol_conflict = any(
    not ops_commute(oc, tc)
    for oc in our_patch["child_ops"]
    for tc in their_patch["child_ops"]
)

When a file cannot be parsed as Python (SyntaxError), _semantic_ops still returns a PatchOp with child_ops = []. The loop above is vacuously False (no pairs to compare), so _independence_merge_blob fires. That function runs a three-way text merge; if both sides changed the same line differently, it union-resolves them — silently writing both lines into the output (e.g. config = ours version\nconfig = theirs version\n) without surfacing a conflict. This is data corruption on real divergence in unparseable files.

This is distinct from the phantom conflict guard in Phase 5 (which handles the convergent case ours_id == theirs_id). Phase 5 does not address this path.

The fix: In Step 1.5's candidate loop, skip the independence path when both sides have empty child_ops AND neither patch was derived from a successful parse. Condition to add:

# Don't union-merge when neither side produced any symbols — the file couldn't
# be parsed, so "no symbol conflicts" means "we have no information", not "safe to merge".
if not our_patch["child_ops"] and not their_patch["child_ops"]:
    continue

Gap 2 — Harmony auto-apply ignores confidence score

What happens today: harmony_auto_apply calls best_resolution(root, pattern_id) which ranks by (human_verified, confidence, applied_count) and returns the top result — then applies it unconditionally regardless of confidence score. A pattern learned from muse checkout --ours (confidence=0.8, human_verified=False) replays forever at the same strength as a carefully hand-edited resolution (confidence=1.0, human_verified=True).

The fix: Add a configurable minimum confidence threshold to harmony_auto_apply. Patterns below the threshold escalate (surface as conflicts) rather than auto-applying silently. The threshold is a repo-level config key with a sensible default.

# In harmony_auto_apply, after best_resolution():
min_confidence = get_config_float("harmony.min_auto_apply_confidence", root, def                    qqqqqqault=0.85)
if best.confidence < min_confidence:
    continue  # escalate — let the user resolve and improve confidence

The default of 0.85 means --ours/--theirs resolutions (0.8) do not auto-apply by default; only manually-verified resolutions (1.0) do.

Deliverables

One new test file: tests/test_phase7_merge_correctness.py.

Step 1.5 fix:

  • [✅] IC_01 — Divergent unparseable Python (SyntaxError on both sides) → conflict surfaces correctly; no silent union-merge.
  • [✅] IC_02 — Divergent valid Python (symbol changed on both sides, same file) → independence merge fires correctly and produces a clean merge when symbols don't overlap.
  • [✅] IC_03 — Divergent valid Python (same symbol changed on both sides) → conflict surfaces; independence merge does NOT fire.
  • [✅] IC_04CodePlugin.merge_ops Step 1.5 skipped when child_ops empty on both sides (unit test on the skip condition).

Harmony confidence gating:

  • [✅] HC_01 — Resolution with confidence >= 0.85 (default threshold) auto-applies on re-merge.
  • [✅] HC_02 — Resolution with confidence < 0.85 (e.g. from --ours checkout, confidence=0.8) does NOT auto-apply; conflict surfaces for manual resolution.
  • [✅] HC_03harmony.min_auto_apply_confidence config key overrides the default; setting it to 0.75 causes the 0.8-confidence pattern to auto-apply.
  • [✅] HC_04 — When no resolution meets the confidence threshold, Harmony escalates cleanly (returns the conflict unresolved, does not error).

All 8 tests green with no regressions to Phases 1–4 or PHANTOM_01–05.


Phase 8 — muse merge --explain

The merge pipeline makes several invisible decisions — strategy routing, per-path case selection, Harmony pattern lookup, .museattributes rule firing — that are currently only visible in debug logs. When a merge produces a surprising result (phantom conflict, unexpected auto-resolution, wrong content in the merged snapshot), there is no way for a user or agent to see why without reading source code.

--explain makes the decision path first-class output.

Design

--explain is a flag on muse merge. It augments the existing output rather than replacing it:

  • muse merge feat --explain — human-readable decision trace to stdout alongside normal merge output.
  • muse merge feat --explain --json — structured machine-readable trace embedded in the JSON envelope; agents can parse it to understand exactly what fired and why.
  • muse merge feat --explain --dry-run — full explanation of what would happen without writing anything.

--explain does not change merge behavior — it only adds observation.

JSON shape (explain key in merge envelope)

"explain": {
  "strategy_routing": {
    "requested_strategy": "recursive",
    "on_conflict": null,
    "history": "merge",
    "resolved_diff_unit": "three_way",
    "resolved_resolution": "escalate",
    "case": 4
  },
  "merge_base_commit_id": "sha256:...",
  "per_path": [
    {
      "path": "config.py",
      "ours_changed": true,
      "theirs_changed": true,
      "ours_id": "sha256:...",
      "theirs_id": "sha256:...",
      "base_id": "sha256:...",
      "decision": "conflict",
      "reason": "both sides modified; content diverged",
      "harmony_checked": true,
      "harmony_pattern_id": null,
      "harmony_result": "no_pattern",
      "attributes_rule": null,
      "independence_merge_attempted": false
    },
    {
      "path": "README.md",
      "ours_changed": false,
      "theirs_changed": true,
      "decision": "take_theirs",
      "reason": "only theirs changed",
      "harmony_checked": false,
      "harmony_pattern_id": null,
      "harmony_result": "not_applicable",
      "attributes_rule": null,
      "independence_merge_attempted": false
    }
  ],
  "summary": {
    "total_paths": 5,
    "clean_no_change": 2,
    "take_ours_only": 0,
    "take_theirs_only": 1,
    "convergent": 1,
    "harmony_auto_resolved": 0,
    "attributes_auto_resolved": 0,
    "independence_merged": 0,
    "conflicts": 1
  }
}

decision values: "no_change" | "take_ours_only" | "take_theirs_only" | "convergent" | "harmony_auto_resolved" | "attributes_auto_resolved" | "independence_merged" | "conflict" | "overlay_auto_resolved".

harmony_result values: "not_applicable" | "no_pattern" | "below_confidence_threshold" | "auto_resolved" | "escalated".

Human-readable format

muse merge feat --explain

  Merge base: sha256:cc521... (2 commits behind)
  Strategy:   recursive (three_way + escalate) → Case 4

  config.py       CONFLICT    both sides modified; diverged
    harmony:      no pattern found
  README.md       take-theirs only theirs changed
  shared.py       no-change   identical on both sides
  utils.py        convergent  both sides independently arrived at same content
  billing.py      harmony     auto-resolved via pattern a3f2c9 (confidence 1.0)

  1 conflict · 1 harmony auto-resolved · 3 clean

Deliverables

One new test file: tests/test_phase8_explain.py.

  • [✅] EX_01--explain flag accepted on muse merge; no error, no change to exit code or merge behavior.
  • [✅] EX_02--explain --dry-run --json produces an explain key in the JSON envelope without writing any commits or MERGE_STATE.
  • [✅] EX_03explain.strategy_routing contains requested_strategy, resolved_diff_unit, resolved_resolution, case.
  • [✅] EX_04explain.per_path entry for a conflicting file has decision == "conflict", ours_changed == true, theirs_changed == true, ours_id != theirs_id.
  • [✅] EX_05explain.per_path entry for a theirs-only change has decision == "take_theirs_only", ours_changed == false, theirs_changed == true.
  • [✅] EX_06explain.per_path entry for a convergent edit has decision == "convergent", ours_changed == true, theirs_changed == true, ours_id == theirs_id.
  • [✅] EX_07explain.per_path entry for an untouched file has decision == "no_change", harmony_checked == false.
  • [✅] EX_08 — After Harmony has a pattern for a conflict, re-merging with --explain --json shows harmony_result == "auto_resolved" and harmony_pattern_id is set for the resolved path.
  • [✅] EX_09explain.summary counts are correct: conflicts, harmony_auto_resolved, convergent, clean_no_change all match the actual per_path entries.
  • [✅] EX_10--explain without --json produces human-readable output containing the merge base commit ID, strategy line, and one line per changed path.

All 10 tests green with no regressions to Phases 1–4, 7, or PHANTOM_01–05.


Acceptance criteria

  • [✅] `muse merge --strategy X --on-conflict Y --history Z` works for every valid combination. (Phase 1 + Phase 3 SM_01–SM_24)
  • [✅] Every cell in the Phase 3 matrix has a passing test. (SM_01–SM_24)
  • [✅] PHANTOM_01–05 pass under every strategy. (Phase 5 + SM_22–SM_24)
  • [✅] Harmony learns correctly from `merge`, `squash`, and `rebase` history modes. (HA_02–HA_04; per-commit rebase loop explicitly deferred in Phase 4)
  • [✅] `muse merge` and `muse hub proposal merge` produce bit-identical merged snapshots given the same inputs. (Phase 6 ME_01–ME_12)
  • [✅] No phantom conflicts under any strategy or history mode. (Phase 5 + SM_22–SM_24)
  • [✅] No silent union-merge on unparseable files (Phase 7 Step 1.5 fix). (IC_01–IC_04)
  • [✅] Harmony auto-apply respects confidence threshold; low-confidence patterns escalate. (HC_01–HC_04)
  • [✅] `muse merge --explain --json` exposes full per-path decision trace for every path touched by the merge. (EX_01–EX_10)
  • [✅] Existing merge tests unaffected. (test_cmd_merge.py passes throughout all phases)

Out of scope

  • Muse wire protocol changes (push validation, pack format, object store).
  • Changes to Harmony's storage format or pattern-matching algorithms.
  • New domain plugins.
Activity
gabriel opened this issue 8 days ago
closed this issue 7 days ago