gabriel / musehub public
Closed #15 feat
filed by gabriel human · 48 days ago

feat(intel): coupling GUI — file co-change heatmap

0 Anchors
Blast radius
Churn 30d
0 Proposals

Overview

Surface muse code coupling in the Intelligence Hub as a ranked file co-change heatmap. The CLI already produces the data; the worker already has a CouplingProvider that calls _run_muse subprocess. This issue replaces the subprocess with a pure-SQL BFS algorithm (same pattern as EntangleProvider), adds indexes, builds the /intel/coupling list page with heat intensity bars, wires a dashboard card, and delivers a 7-tier test suite.

CLI output shape (verified against muse code coupling --json on this repo):

{ "pairs": [
    { "file_a": "musehub/api/routes/wire.py",
      "file_b": "musehub/services/musehub_wire.py",
      "co_changes": 33 },
    { "file_a": "musehub/models/musehub.py",
      "file_b": "musehub/services/musehub_repository.py",
      "co_changes": 19 }
] }

Top pair for this repo: wire.py ↔ musehub_wire.py (33). Only 3 fields. Only file paths, no symbols.


Web UI Wireframe

┌─────────────────────────────────────────────────────────────────────────┐
│  ⚡ COUPLING                                         gabriel/musehub    │
│  File pairs that co-change most frequently — structural coupling signal  │
├─────────────────────────────────────────────────────────────────────────┤
│  PAIRS  20   REF  sha256:cedbb6f8   BUILT  2026-05-03                   │
├─────────────────────────────────────────────────────────────────────────┤
│  MIN CO-CHANGES ≥ [  2  ]   SHOW [ 50 ▾ ]                  [ Apply ]   │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │  routes/wire.py  ↔  services/musehub_wire.py               33      │ │
│ │  ████████████████████████████████████████████████████████          │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │  models/musehub.py  ↔  services/musehub_repository.py      19      │ │
│ │  ████████████████████████████████████████████               │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │  models/wire.py  ↔  services/musehub_wire.py                16      │ │
│ │  ████████████████████████████████████                               │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │  mcp/dispatcher.py  ↔  mcp/tools/musehub.py                 15      │ │
│ │  ████████████████████████████████                                   │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│                                                                         │
│  Heat:  ░ low (< 10)   ▒ medium (10–19)   ▓ high (≥ 20)               │
└─────────────────────────────────────────────────────────────────────────┘

Theme tokens used:

  • --bg-surface / --bg-elevated / --bg-hover — list and bar surfaces
  • --border-default / --border-subtle — list chrome
  • --color-accent — low coupling bar fill (< 10)
  • --color-warning — medium coupling (10–19)
  • --color-danger — high coupling (≥ 20)
  • --font-mono — file paths and counts
  • intel-page-header / intel-meta-bar / intel-meta-pill — standard Intel Hub header

Current state

What Status
musehub_intel_coupling table ✅ (repo_id PK, file_a PK, file_b PK, co_changes, ref)
CouplingProvider ⚠️ uses _run_muse subprocess — breaks without local repo
DB indexes ❌ only ix_intel_coupling_repo — missing co_changes + file_a indexes
/intel/coupling route + template
Dashboard card
Tests

Phase 0 — Rewrite CouplingProvider to pure SQL

Replace _run_muse with a BFS commit walk over musehub_symbol_history_entries.

Algorithm (mirrors muse code coupling exactly):

1. Fetch all commits for repo → commit_parents dict
2. BFS from HEAD ref, cap at _MAX_WALK = 10,000 commits
3. Bulk-fetch history entries for repo → (commit_id, address)
4. For each commit in walk:
     file = address.split("::")[0]   # bare paths are valid at file level
     collect distinct files per commit
5. If len(files) > _MAX_FILES_PER_COMMIT (200) → skip (mass commit)
6. For each unordered file pair (a < b) → pair_co_changes[(a,b)] += 1
7. Filter: co_changes >= _MIN_CO_CHANGES (2)
8. Sort DESC by co_changes, truncate to _MAX_PAIRS (200)
9. DELETE stale rows for repo, upsert fresh set

Key difference from EntangleProvider: file-level not symbol-level. Bare path addresses (no ::) are valid here — they represent files directly. No import filter. No Jaccard rate — raw count is the signal.

Class docstring:

class CouplingProvider:
    """Persist co-changing file pairs by mining musehub_symbol_history_entries.

    Mirrors ``muse code coupling`` exactly — same BFS commit walk, same
    mass-commit exclusion, same minimum co-change threshold.

    Unlike EntangleProvider (symbol-level), this works at the file level.
    For each history entry, the file is derived as address.split("::")[0].
    Bare-path entries (no "::") are treated as filenames directly — they
    are valid signals at the file level, unlike at the symbol level.

    Algorithm
    ---------
    1. Fetch all commits → BFS walk from HEAD, cap _MAX_WALK.
    2. Bulk-fetch history entries for repo.
    3. Per commit: collect distinct file set.
    4. Skip commits with > _MAX_FILES_PER_COMMIT distinct files.
    5. Accumulate pair_co_changes[(a, b)] for every (a < b) file pair.
    6. Filter co_changes >= _MIN_CO_CHANGES; sort DESC; truncate _MAX_PAIRS.
    7. DELETE stale rows; upsert fresh set.

    Constants
    ---------
    _MAX_WALK             = 10_000   BFS depth cap
    _MAX_FILES_PER_COMMIT = 200      mass-commit guard (tighter than symbol-level 500)
    _MAX_PAIRS            = 200      stored leaderboard size
    _MIN_CO_CHANGES       = 2        noise floor
    """

Phase 1 — Migration 0011

# alembic/versions/0011_coupling_indexes.py
revision = "0011"
down_revision = "0010"

def upgrade():
    op.create_index("ix_intel_coupling_repo_co",
                    "musehub_intel_coupling", ["repo_id", "co_changes"])
    op.create_index("ix_intel_coupling_repo_file_a",
                    "musehub_intel_coupling", ["repo_id", "file_a"])

Add both indexes to MusehubIntelCoupling.__table_args__ in musehub_models.py.


Phase 2 — SCSS (two-file split)

src/scss/components/_coupling.scss — visual only:

.cp-list           border, border-radius, overflow hidden, bg-surface
.cp-pair-row       border-top subtle, hover bg-hover transition
.cp-file-a         color text-muted, font-mono
.cp-file-b         color accent-link, font-mono
.cp-arrow          color text-muted, font-size 0.75rem, flex-shrink 0
.cp-count          font-mono, font-weight 700, color text-primary, flex-shrink 0
.cp-bar-track      bg-elevated, border-radius 2px, overflow hidden
.cp-bar-fill       height 100%, bg accent, border-radius 2px, transition width 200ms
  &--medium        bg warning
  &--high          bg danger
.cp-filter-label   uppercase, muted, font-size 0.68rem, letter-spacing 0.07em
.cp-empty-state    text-center, color muted, padding 3rem

src/scss/pages/_coupling.scss — layout only:

.cp-wrap           padding 0
.intel-page-header margin-bottom 1.25rem
.cp-filter-bar     flex row, gap 0.75rem, margin-bottom 1rem, flex-wrap wrap
.cp-filter-group   flex, align-center, gap 0.4rem
.cp-list           flex-col
.cp-pair-row       grid: "files count" "bar bar" / 1fr auto; padding 0.75rem 1rem; gap 0.35rem 0.75rem
.cp-files          grid-area files; flex row; gap 0.5rem; min-width 0; overflow hidden
.cp-count          grid-area count; align-self center; flex-shrink 0
.cp-bar-wrap       grid-area bar; height 3px

Wire into app.scss:

@use "components/coupling";
@use "pages/coupling" as page-coupling;

Phase 3 — Route + template

GET /{owner}/{repo_slug}/intel/coupling

async def intel_coupling_page(
    request, owner, repo_slug, db,
    min_co: int = 2,
    top: int = 50,
):
    """Render the file co-change coupling leaderboard.

    Reads from musehub_intel_coupling ordered by co_changes DESC.
    Applies min_co filter and top limit in SQL.  Computes bar widths
    by normalising each co_changes against the top pair's value.
    File paths are shortened to their last two path components for display.

    Query parameters
    ----------------
    min_co : int   Minimum co-change count inclusive (default 2).
    top    : int   Page size; one of [25, 50, 100, 200] (default 50).

    Template context
    ----------------
    pairs        list[dict]  — file_a, file_b, short_a, short_b,
                               co_changes, bar_pct, heat_modifier
    total_count  int         — total pairs stored before filter
    min_co       int         — active filter
    selected_top int         — active page size
    valid_tops   list[int]   — [25, 50, 100, 200]
    index_meta   IndexMeta | None
    """

Heat modifier: "high" if co_changes >= 20, "medium" if >= 10, else "". Short path: last 2 components — "musehub/services/musehub_wire.py""services/musehub_wire.py".

Template: musehub/templates/musehub/pages/intel_coupling.html

  • Breadcrumb: owner / repo / intel / coupling
  • <header class="intel-page-header"> with icon + desc
  • intel-meta-bar: pairs count | ref | built
  • Filter form: min_co number input + top select + Apply button
  • .cp-list with .cp-pair-row for each pair

Phase 4 — Dashboard card + grid update

Add 6th card to intel_dashboard.html after the entangle card:

{# Coupling #}
<div class="intel-card">
  <div class="intel-card-hd">
    <span class="intel-card-title">{{ icon("zap", 12) }} COUPLING</span>
    <a href="{{ base_url }}/intel/coupling" class="intel-card-more">View all →</a>
  </div>
  {% if coupling_count > 0 %}
  <div class="intel-dead-summary">
    <span class="intel-dead-count" style="color:var(--color-warning)">
      {{ coupling_count | fmtnum }}
    </span> pair{{ "s" if coupling_count != 1 }}
  </div>
  <ul class="intel-dead-list">
    {% for p in coupling_preview %}
    <li class="intel-dead-row">
      <span class="intel-dead-addr font-mono" title="{{ p.file_a }} ↔ {{ p.file_b }}">
        {{ p.short_a }} ↔ {{ p.short_b }}
      </span>
      <span class="intel-dead-age">{{ p.co_changes | fmtnum }}</span>
    </li>
    {% endfor %}
  </ul>
  {% else %}
  <div class="intel-card-empty">No coupling data yet.</div>
  {% endif %}
</div>

Update .intel-cards grid in pages/_intel.scss:

.intel-cards {
  grid-template-columns: repeat(6, 1fr);
  @media (max-width: 1400px) { grid-template-columns: repeat(3, 1fr); }
  @media (max-width: 900px)  { grid-template-columns: repeat(2, 1fr); }
  @media (max-width: 540px)  { grid-template-columns: 1fr; }
}

Dashboard route adds coupling_count + coupling_preview (top 3, non-test files) to context.


Phase 5 — Test suite (CP_01–CP_49)

Tier 1 — Unit (CP_01–CP_08)

CP_01  file from symbol address: "src/a.py::fn" → "src/a.py"
CP_02  bare path treated as file: "cloudflare" → "cloudflare"
CP_03  pair key canonical a < b lexicographically
CP_04  same-file pair excluded (file_a == file_b)
CP_05  heat_modifier "" for co_changes < 10
CP_06  heat_modifier "medium" for co_changes 10–19
CP_07  heat_modifier "high" for co_changes >= 20
CP_08  _MIN_CO_CHANGES constant == 2

Tier 2 — Integration (CP_09–CP_18)

CP_09  empty repo → no pairs, empty result
CP_10  no history entries → no pairs
CP_11  single co-change commit → co_changes=1, below threshold, no row stored
CP_12  two co-change commits → co_changes=2, one pair stored
CP_13  three files in one commit → 3 cross-file pairs
CP_14  two symbols in same file → no pair (file_a == file_b)
CP_15  stored pair always has file_a <= file_b
CP_16  ref column populated correctly
CP_17  co_changes count exact
CP_18  provider result key == "intel.code.coupling"

Tier 3 — E2E (CP_19–CP_25)

CP_19  three files across 5 commits → correct co_changes ranking
CP_20  result metadata count matches stored rows
CP_21  truncated=True when pairs exceed MAX_PAIRS
CP_22  min_co filter excludes low-signal pairs from route
CP_23  top=25 returns at most 25 rows from route
CP_24  heat_modifier "high" on pairs with co_changes >= 20
CP_25  bar_pct == 100 for top pair

Tier 4 — Performance (CP_26–CP_32)

CP_26  10 commits × 10 files → completes < 500ms
CP_27  100 commits × 20 files → completes < 2s
CP_28  empty repo fast-path → < 50ms
CP_29  second run not > 5× slower than first
CP_30  point lookup (fetch pairs for repo) < 10ms after run
CP_31  200-pair leaderboard renders in route < 200ms
CP_32  dashboard preview query < 20ms

Tier 5 — State integrity (CP_33–CP_38)

CP_33  idempotent: two runs produce identical rows
CP_34  stale rows purged: DELETE before upsert
CP_35  incremental: new commits add new pairs on re-run
CP_36  no duplicate (file_a, file_b) rows after 3 runs
CP_37  co_changes increases when more co-change commits added
CP_38  truncated=False when pairs <= MAX_PAIRS

Tier 6 — Security (CP_39–CP_44)

CP_39  SQL injection in file path stored verbatim, table survives
CP_40  XSS payload in file path stored safely
CP_41  repo A pairs never visible when querying repo B
CP_42  two repos each get independent pair sets
CP_43  re-run for new ref updates ref column on all rows
CP_44  unicode in file path handled without crash

Tier 7 — Stress (CP_45–CP_49)

CP_45  MAX_PAIRS cap: 50 files × 3 commits → stored <= MAX_PAIRS
CP_46  mass-commit exclusion: commit with > 200 files skipped
CP_47  500 commits × 5 files → completes without error
CP_48  result count matches stored rows
CP_49  BFS walk cap: commits_analysed <= MAX_WALK

Acceptance criteria

  • CouplingProvider uses pure-SQL BFS — no _run_muse, no local repo required
  • Migration 0011 adds ix_intel_coupling_repo_co and ix_intel_coupling_repo_file_a
  • /intel/coupling page renders from DB, median load < 200ms
  • Heat bars: accent (< 10) / warning (10–19) / danger (≥ 20)
  • File paths truncated to last 2 components in display
  • Dashboard 6th card wired; .intel-cards updated to 6-col grid
  • 49 tests (CP_01–CP_49) across 7 tiers, all green
  • Data parity: co_changes values match muse code coupling --json exactly
  • No regressions on existing intel pages
Activity2
gabriel opened this issue 48 days ago
gabriel 48 days ago

Phase 5 Complete — All 53 Tests Pass ✓

Coupling test suite shipped and all green:

Tier Cases Coverage
Unit CP_01–CP_08 file derivation, heat modifier, pair canonicalisation
Integration CP_09–CP_18 provider upserts, re-runs, counts, ref column
E2E CP_19–CP_25 full seeded scenarios, min_co filter, top limit
Performance CP_26–CP_32 timing bounds under realistic load
State CP_33–CP_38 idempotency, stale-row purge, incremental
Security CP_39–CP_44 injection strings, repo isolation, unicode paths
Stress CP_45–CP_49 MAX_PAIRS cap, mass-commit exclusion, BFS cap
Helper TestCpShort ×4 _cp_short display helper

All 5 phases complete. Merged dev→main, pushed local. Ready for staging deploy on your go.

gabriel 48 days ago

Phase 5 complete — all 53 coupling tests pass (7 tiers: Unit CP_01-08, Integration CP_09-18, E2E CP_19-25, Performance CP_26-32, State CP_33-38, Security CP_39-44, Stress CP_45-49, plus 4 TestCpShort helpers). Merged dev→main, pushed local. Ready for staging deploy on your go.