gabriel / musehub public
Closed #14 feat
filed by gabriel human · 47 days ago

feat(intel): coupling GUI — file co-change heatmap

0 Anchors
Blast radius
Churn 30d
0 Proposals

Overview

Surface muse code coupling in the Intelligence Hub as a ranked file co-change heatmap. The CLI already produces the data; the worker already has a CouplingProvider that calls _run_muse subprocess. This issue replaces the subprocess with a pure-SQL BFS algorithm (same pattern as EntangleProvider), adds indexes, builds the /intel/coupling list page with heatmap-style heat intensity bars, wires a dashboard card, and delivers a 7-tier test suite.

CLI output shape (verified against muse code coupling --json on this repo):

{
  "pairs": [
    { "file_a": "musehub/api/routes/wire.py",
      "file_b": "musehub/services/musehub_wire.py",
      "co_changes": 33 },
    { "file_a": "musehub/models/musehub.py",
      "file_b": "musehub/services/musehub_repository.py",
      "co_changes": 19 },
    ...
  ]
}

Top pairs for this repo: wire.py ↔ musehub_wire.py (33), models ↔ repository (19), models/wire ↔ musehub_wire (16). Only 20 pairs total — a tight, readable signal.


Web UI Wireframe

┌─────────────────────────────────────────────────────────────────────────┐
│  ⚡ COUPLING                                         gabriel/musehub    │
│  File pairs that co-change most frequently — structural coupling signal  │
├─────────────────────────────────────────────────────────────────────────┤
│  PAIRS  20   REF  sha256:cedbb6f8   BUILT  2026-05-03                   │
├─────────────────────────────────────────────────────────────────────────┤
│  MIN CO-CHANGES ≥ [ 2  ]   SHOW [ 50 ▾]              [ Apply ]         │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │  musehub/api/routes/wire.py  ↔  musehub/services/musehub_wire.py   │ │
│ │  ████████████████████████████████████████████████████  33          │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │  musehub/models/musehub.py  ↔  musehub/services/musehub_repository │ │
│ │  ████████████████████████████████████████████          19          │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │  musehub/models/wire.py  ↔  musehub/services/musehub_wire.py       │ │
│ │  ██████████████████████████████████████                16          │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │  musehub/mcp/dispatcher.py  ↔  musehub/mcp/tools/musehub.py        │ │
│ │  █████████████████████████████████████                 15          │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│                                                                         │
│  Heat key:  ░░ low   ▒▒ medium   ▓▓ high   ██ critical                 │
└─────────────────────────────────────────────────────────────────────────┘

Theme elements used:

  • --bg-surface / --bg-elevated / --bg-hover — row surfaces
  • --border-default / --border-subtle — list dividers
  • --color-accent → low coupling bar fill
  • --color-warning → medium coupling (≥ 10 co-changes)
  • --color-danger → high coupling (≥ 20 co-changes)
  • --font-mono — file paths, counts
  • --gradient-spectral — optional heat-key decorative strip
  • intel-page-header / intel-meta-bar / intel-meta-pill — standard Intel Hub header pattern

Current state

What Status
musehub_intel_coupling table ✅ exists (repo_id, file_a, file_b, co_changes, ref)
CouplingProvider ⚠️ exists but uses _run_muse subprocess — breaks in environments without a local repo
DB indexes ❌ only ix_intel_coupling_repo — missing rate/file indexes
/intel/coupling route + template ❌ not implemented
Dashboard card ❌ not wired
Tests ❌ none

Phase 0 — Rewrite CouplingProvider to pure SQL

Replace the _run_muse subprocess call with a BFS commit walk over musehub_symbol_history_entries, identical in structure to EntangleProvider.

Algorithm (mirrors muse code coupling exactly):

1. Fetch all commits for repo → commit_parents dict
2. BFS from HEAD, cap at _MAX_WALK = 10,000 commits
3. Bulk-fetch history entries → (commit_id, address) pairs
4. For each commit in walk:
   - Derive file = address.split("::")[0]  (or bare address if no "::")
   - Skip entries where file is empty or starts with special prefixes
   - If len(distinct files in commit) > _MAX_FILES_PER_COMMIT (200) → skip (mass commit)
   - For each pair (file_a, file_b) where file_a < file_b → pair_co_changes[(a,b)] += 1
5. Filter: co_changes >= _MIN_CO_CHANGES (2), file_a != file_b (guaranteed by sort)
6. Sort by co_changes DESC
7. Truncate to _MAX_PAIRS (200)
8. DELETE stale rows for repo, upsert fresh set

Key differences from EntangleProvider:

  • File-level not symbol-level — file = address.split("::")[0] (bare paths are valid here)
  • No import-symbol filter needed (working at file level)
  • No Jaccard rate — raw co_changes count is the signal
  • _MAX_FILES_PER_COMMIT = 200 (tighter than symbol-level 500 — a commit touching 200+ files is a mass-import or scaffolding, not signal)

Docstring (load-bearing):

class CouplingProvider:
    """Persist co-changing file pairs by mining musehub_symbol_history_entries.

    Mirrors ``muse code coupling`` exactly — same BFS commit walk, same
    mass-commit exclusion, same minimum co-change threshold.

    Algorithm
    ---------
    1. BFS-walk commits from HEAD (cap _MAX_WALK).
    2. Bulk-fetch all history entries for this repo.
    3. For each commit, derive the touched file set by splitting each address
       on ``::`` and taking the left part.  Bare-path entries (no ``::``) are
       treated as file paths directly — unlike EntangleProvider which filters
       them out, because at the file level they are valid.
    4. Skip commits where the distinct file count exceeds _MAX_FILES_PER_COMMIT
       (mass scaffolding / import commits produce O(N²) noise).
    5. For each qualifying commit, accumulate pair_co_changes[(a, b)] for every
       unordered file pair (a < b lexicographically).
    6. Filter: co_changes >= _MIN_CO_CHANGES, then sort DESC, truncate to
       _MAX_PAIRS.
    7. DELETE stale rows, upsert fresh set.

    Constants
    ---------
    _MAX_WALK              = 10_000   cap on BFS commit depth
    _MAX_FILES_PER_COMMIT  = 200      mass-commit guard
    _MAX_PAIRS             = 200      stored leaderboard size
    _MIN_CO_CHANGES        = 2        noise floor
    """

Phase 1 — Migration 0011: add indexes

# alembic/versions/0011_coupling_indexes.py
revision = "0011"
down_revision = "0010"

def upgrade():
    op.create_index("ix_intel_coupling_repo_co", "musehub_intel_coupling",
                    ["repo_id", "co_changes"])
    op.create_index("ix_intel_coupling_repo_file_a", "musehub_intel_coupling",
                    ["repo_id", "file_a"])

Phase 2 — SCSS

Two-file split (structural/visual) following the established pattern:

src/scss/components/_coupling.scss — visual only

.cp-list          border + radius surface
.cp-pair-row      divider, hover tint
.cp-file-a        muted — left file path (font-mono)
.cp-file-b        accent-link — right file path (font-mono)
.cp-arrow         muted ↔ separator
.cp-count         bold mono right-aligned
.cp-bar-track     bg-elevated rail
.cp-bar-fill      accent base fill
  &--medium       warning fill  (co_changes >= 10)
  &--high         danger fill   (co_changes >= 20)
.cp-filter-label  uppercase muted label
.cp-empty-state   centered muted with icon

src/scss/pages/_coupling.scss — layout only

.cp-wrap          padding:0
.intel-page-header  margin-bottom (same rule as stable/dead)
.cp-filter-bar    flex row, gap, margin-bottom
.cp-filter-group  flex align-center, gap
.cp-list          flex-col
.cp-pair-row      grid 1fr auto / auto auto; padding 0.75rem 1rem
.cp-files         grid-col 1, row 1; flex row, gap, min-width 0, overflow hidden
.cp-stats         grid-col 2, row 1; flex-col, align-end
.cp-bar-wrap      grid-col 1/-1, row 2; height 3px

Wire into app.scss:

@use "components/coupling";
@use "pages/coupling" as page-coupling;

Phase 3 — Route + template

Route: GET /{owner}/{repo_slug}/intel/coupling

async def intel_coupling_page(request, owner, repo_slug, db,
                               min_co: int = 2,
                               top: int = 50):
    """
    Render the file co-change coupling leaderboard.

    Reads from musehub_intel_coupling ordered by co_changes DESC.
    Applies min_co filter in SQL.  Computes bar widths client-free
    by normalising against the top pair's co_changes.

    Parameters
    ----------
    min_co : int
        Minimum co-change count to include (default 2, noise floor).
    top : int
        Maximum pairs to display (choices: 25, 50, 100, 200).

    Context variables
    -----------------
    pairs         list of dicts — file_a, file_b, co_changes,
                  bar_pct, heat_modifier
    total_count   int — total stored pairs before filter
    min_co        int — current filter value
    selected_top  int — current page size
    valid_tops    list[int] — [25, 50, 100, 200]
    index_meta    IndexMeta | None
    """

Heat modifier logic:

def _cp_heat(co_changes: int) -> str:
    if co_changes >= 20: return "high"
    if co_changes >= 10: return "medium"
    return ""

Template: intel_coupling.html

{% extends "musehub/base.html" %}
breadcrumb: owner / repo / intel / coupling

<header class="intel-page-header">
  {{ icon("zap", 16) }} Coupling
  <p>File pairs that co-change most frequently — structural coupling signal.</p>
</header>

intel-meta-bar: pairs | ref | built

<form> min_co input + top select + Apply button </form>

<div class="cp-list">
  {% for p in pairs %}
  <div class="cp-pair-row">
    <div class="cp-files">
      <span class="cp-file-a font-mono">{{ p.short_a }}</span>
      <span class="cp-arrow">↔</span>
      <span class="cp-file-b font-mono">{{ p.short_b }}</span>
    </div>
    <span class="cp-count font-mono">{{ p.co_changes | fmtnum }}</span>
    <div class="cp-bar-wrap">
      <div class="cp-bar-track">
        <div class="cp-bar-fill{% if p.heat_modifier %} cp-bar-fill--{{ p.heat_modifier }}{% endif %}"
             style="width:{{ p.bar_pct }}%"></div>
      </div>
    </div>
  </div>
  {% endfor %}
</div>

File paths are truncated to their last two components for display: musehub/services/musehub_wire.pyservices/musehub_wire.py


Phase 4 — Dashboard card

Add a 6th card to .intel-cards on the dashboard (after entangle):

┌─────────────────────┐
│ ⚡ COUPLING         │       View all →
├─────────────────────┤
│  20 pairs           │
│                     │
│ routes/wire ↔       │
│   services/wire  33 │
│                     │
│ models/musehub ↔    │
│   services/repo  19 │
│                     │
│ models/wire ↔       │
│   services/wire  16 │
└─────────────────────┘

Update .intel-cards grid: repeat(5, 1fr)repeat(6, 1fr). New breakpoints: 1400px → 3col, 960px → 2col, 540px → 1col.

Route adds coupling_count + coupling_preview (top 3 non-test pairs) to dashboard context.


Phase 5 — Test suite (CP_01–CP_49)

Tier 1 — Unit (CP_01–CP_08)

CP_01  file extraction from symbol address  "src/a.py::fn" → "src/a.py"
CP_02  bare path treated as file            "cloudflare" → "cloudflare"
CP_03  pair key canonical a < b             ("z.py", "a.py") → ("a.py", "z.py")
CP_04  same-file pair excluded              "src/a.py::fn1" + "src/a.py::fn2"  → no pair
CP_05  heat modifier ""    for co < 10
CP_06  heat modifier "medium" for co = 10..19
CP_07  heat modifier "high"  for co >= 20
CP_08  _MIN_CO_CHANGES constant == 2

Tier 2 — Integration (CP_09–CP_18)

CP_09  empty repo → no pairs
CP_10  no history entries → no pairs
CP_11  single co-change commit → co_changes=1 → below threshold, no row
CP_12  two co-change commits → co_changes=2 → one pair stored
CP_13  three files in commit → 3 pairs (A↔B, A↔C, B↔C)
CP_14  same-file symbols (two fns in same file) → no pair
CP_15  pair key stored canonical (a < b)
CP_16  ref column populated correctly
CP_17  co_changes count exact
CP_18  bar_pct = 100 for top pair

Tier 3 — E2E (CP_19–CP_25)

CP_19  three files across 5 commits → correct ranking
CP_20  top pair has bar_pct = 100, second pair proportional
CP_21  result metadata: key="intel.code.coupling", count matches stored rows
CP_22  truncated=True when over MAX_PAIRS
CP_23  min_co filter removes low-signal pairs from route response
CP_24  top=25 returns at most 25 rows
CP_25  heat_modifier "high" on pairs with co_changes >= 20

Tier 4 — Performance (CP_26–CP_32)

CP_26  10 commits × 10 files → completes < 500ms
CP_27  100 commits × 20 files → completes < 2s
CP_28  empty repo fast-path → < 50ms
CP_29  second run not > 5× slower than first
CP_30  point lookup (fetch pairs for repo) < 10ms after provider run
CP_31  200-pair leaderboard rendered in route < 200ms
CP_32  dashboard preview query < 20ms

Tier 5 — State integrity (CP_33–CP_38)

CP_33  idempotent: two runs produce identical rows
CP_34  stale rows purged on re-run (DELETE before upsert)
CP_35  incremental: new commits add new pairs on re-run
CP_36  no duplicate (file_a, file_b) rows after 3 runs
CP_37  co_changes increases when more co-change commits added
CP_38  truncated flag False when pairs ≤ MAX_PAIRS

Tier 6 — Security (CP_39–CP_44)

CP_39  SQL injection in file path stored verbatim, table survives
CP_40  XSS payload in file path stored safely
CP_41  repo A pairs never visible in repo B query
CP_42  two repos each get independent pair sets
CP_43  re-run for new ref updates ref column on all rows
CP_44  unicode in file path handled without crash

Tier 7 — Stress (CP_45–CP_49)

CP_45  MAX_PAIRS cap: 50 files × 3 commits → stored ≤ MAX_PAIRS
CP_46  mass-commit exclusion: commit with >200 files skipped
CP_47  500 commits × 5 files → completes without error
CP_48  result count matches stored rows
CP_49  BFS walk cap: commits_analysed ≤ MAX_WALK

Acceptance criteria

  • CouplingProvider uses pure-SQL BFS — no _run_muse, no local repo required
  • Migration 0011 adds ix_intel_coupling_repo_co + ix_intel_coupling_repo_file_a
  • /intel/coupling page renders from DB, median load < 200ms
  • Heat intensity bars: accent (low) / warning (medium ≥10) / danger (high ≥20)
  • File paths truncated to last 2 components in display
  • Dashboard 6th card wired; grid updated to 6-col
  • 49 tests across 7 tiers, all green on python3 -m pytest tests/test_coupling_provider.py
  • Data parity: GUI co_changes values match muse code coupling --json output exactly
  • No regressions on existing intel pages
Activity1
gabriel opened this issue 47 days ago
gabriel 47 days ago

Duplicate of #15. Closing.