gabriel / musehub public
Closed #13
filed by gabriel human · 47 days ago

feat(intel): entangle — symbol co-change GUI (EntangleProvider + /intel/entangle pages)

0 Anchors
Blast radius
Churn 30d
0 Proposals

⚡ Symbol Entanglement — Intelligence Hub

┌─ gabriel / musehub · intel / entangle ─────────────────────────────────────┐
│                                                                              │
│  ⚡ Entangle                              47 pairs · 579 commits · HEAD     │
│                                                                              │
│  ┌───────────────────────┐  ┌──────────────────────┐                        │
│  │  rate ≥  [ 0.0      ] │  │  co-changes ≥  [ 2 ] │  □ linked  □ test     │
│  └───────────────────────┘  └──────────────────────┘                        │
│                                                                              │
│ ┌────────────────────────────────────────────────────────────────────────┐  │
│ │  services/musehub_wire.py                                              │  │
│ │    wire_push_stream                                                    │  │
│ │  ↔  tests/test_wire_push_stream.py              [test]                 │  │
│ │    test_t7_push_stream_response_contains_result_frame                  │  │
│ │  ████████████████████████████████████████████  100%   5 / 5           │  │
│ └────────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
│ ┌────────────────────────────────────────────────────────────────────────┐  │
│ │  api/routes/musehub/repos.py                                           │  │
│ │    fork_repo                                                           │  │
│ │  ↔  models/musehub.py                                                  │  │
│ │    ForkRepoRequest                                                     │  │
│ │  ███████████████████████████████████░░░░░░░░░   80%   4 / 5           │  │
│ └────────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
│ ┌────────────────────────────────────────────────────────────────────────┐  │
│ │  api/routes/wire.py                                                    │  │
│ │    handle_commit_stream                                                │  │
│ │  ↔  services/musehub_wire.py                    [linked]               │  │
│ │    wire_push_stream                                                    │  │
│ │  █████████████████████████░░░░░░░░░░░░░░░░░░   58%   7 / 12           │  │
│ └────────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

Purpose

Mirror muse code entangle --json output in the MuseHub GUI with full fidelity. The CLI finds symbol pairs that co-change in commits but have no structural import or call-graph link — hidden "keep-in-sync" contracts that cause silent breakage when one side is updated and the other is forgotten.

The web GUI must be a full-fidelity mirror of the CLI. If muse code entangle returns a pair, it must appear in the GUI with the exact same rate, co-change count, and structural-link flag.

CLI reference

muse code entangle [--top N] [--min-rate RATE] [--min-co-changes N]
                   [--symbol ADDRESS] [--since REF] [--include-same-file] [--json]

JSON schema (authoritative)

{
  "ref": "dev",
  "commits_analysed": 579,
  "truncated": false,
  "filters": {
    "min_rate": 0.0,
    "min_co_changes": 2,
    "symbol": null,
    "since": null,
    "include_same_file": false,
    "top": 20,
    "max_commits": 10000
  },
  "pairs": [
    {
      "symbol_a": "musehub/services/musehub_wire.py::wire_push_stream",
      "symbol_b": "tests/test_wire_push_stream.py::test_t7_push_stream_response_contains_result_frame",
      "file_a": "musehub/services/musehub_wire.py",
      "file_b": "tests/test_wire_push_stream.py",
      "same_file": false,
      "structurally_linked": false,
      "co_changes": 5,
      "commits_both_active": 5,
      "co_change_rate": 1.0,
      "a_in_test": false,
      "b_in_test": true
    }
  ]
}

DB Gap — Migration Required

musehub_intel_entangle exists but is missing fields the CLI returns. Add via Alembic:

Column Type Notes
commits_both_active INTEGER denominator for rate display
file_a VARCHAR(512) for file-scoped queries
file_b VARCHAR(512) for file-scoped queries
same_file BOOLEAN filter/badge
a_in_test BOOLEAN [test] badge on symbol A
b_in_test BOOLEAN [test] badge on symbol B

Also add indexes:

  • (repo_id, file_a) — per-file entangle lookups
  • (repo_id, co_change_rate DESC) — leaderboard sort
  • (repo_id, symbol_a) — per-symbol focus view
  • (repo_id, symbol_b) — reverse per-symbol focus view

Phase 0 — Migration

File: alembic/versions/0010_entangle_fields.py

revision = "0010"
down_revision = "0009"

Add the six columns above to musehub_intel_entangle. Add the four indexes above. Update MusehubIntelEntangle mapped columns in musehub_models.py.


Phase 1 — EntangleProvider

File: musehub/services/musehub_intel_providers.py

Algorithm (mirrors CLI exactly)

The CLI mines structured_delta from commit history. On MuseHub, the same data lives in musehub_symbol_history_entries (one row per symbol per commit).

1. BFS-walk musehub_commits from HEAD (same pattern as StableProvider)
   Cap at MAX_WALK = 10,000 commits.

2. Bulk-fetch all musehub_symbol_history_entries for this repo.
   Group by commit_id → set of addresses touched.

3. Skip commits where |symbols_touched| > MAX_SYMBOLS_PER_COMMIT (500).
   These are mass-refactors that produce O(N²) noise.

4. Exclude import pseudo-symbols (address contains "::import::").

5. For each pair (A, B) that appear in the same commit:
   - commits_both_active  = |commits where A active| ∩ |commits where B active|
   - co_changes           = |commits where BOTH A and B changed|
   - co_change_rate       = co_changes / commits_both_active
   - same_file            = file_a == file_b
   - structurally_linked  = A's file imports B's file OR vice versa
                            (check musehub_symbol_intel or snapshot imports)
   - a_in_test            = "test" in file_a
   - b_in_test            = "test" in file_b

6. Filter: co_changes >= 2, same_file == False (default)

7. Sort by co_change_rate DESC, co_changes DESC

8. DELETE existing rows for repo_id, then bulk INSERT top 500 pairs.

Return value

[("intel.code.entangle", {"count": N, "commits_analysed": total, "truncated": N > 500})]

Registration

  • Add "intel.code.entangle": EntangleProvider() to _PROVIDER_REGISTRY
  • Add "intel.code.entangle" to job_types_for_push() for code-domain repos

Phase 2 — SCSS

src/scss/components/_entangle.scss

Visual rules only — colors, backgrounds, transitions, typography. Token vocabulary: --color-purple, --color-warning, --bg-surface, --bg-elevated, --border-default, --border-subtle, --gradient-spectral, --text-primary, --text-secondary, --text-muted, --font-mono.

Key classes:

  • .et-rate-bar — horizontal fill bar, color interpolated by rate (warning→danger at >0.9)
  • .et-rate-val — mono badge showing 91%
  • .et-pair-row — hover state, transition 120ms
  • .et-sym-a, .et-sym-b — truncated mono address links
  • .et-arrow — the connector, --text-muted
  • .et-badge--test[test] pill, accent color
  • .et-badge--linked[linked] pill, success color
  • .et-badge--same-file[same-file] pill, muted
  • .et-empty-state — flex column, centered, gap 1rem
  • .et-filter-bar — rate / co-changes filter controls

src/scss/pages/_entangle.scss

Structural layout only — display, grid, flex, padding, margin, gap, width.

Key classes:

  • .et-wrappadding: 0
  • .et-listdisplay: flex; flex-direction: column
  • .et-pair-rowdisplay: grid; grid-template-columns: 1fr auto auto auto
  • .et-symbolsdisplay: flex; align-items: center; gap: 0.5rem; min-width: 0
  • .et-metadisplay: flex; align-items: center; gap: 0.5rem; flex-shrink: 0

Wire both into src/scss/app.scss.


Phase 3 — List Route + Template

Route: GET /{owner}/{repo}/intel/entangle File: musehub/api/routes/musehub/ui_intel.py

Query parameters

Param Default Notes
min_rate 0.0 float 0.0–1.0, validated
min_co 2 int ≥ 1
include_linked false show structurally-linked pairs
include_test false show pairs where either symbol is in a test file
page 1 pagination, 50 rows/page

Template: intel_entangle.html

╔══════════════════════════════════════════════════════╗
║  ⚡ ENTANGLE          47 pairs · 579 commits          ║
╚══════════════════════════════════════════════════════╝

  [Filter: rate ≥ ____] [co-changes ≥ ____] [☐ linked] [☐ test]

  #   SYMBOL A ↔ SYMBOL B                    RATE   CO/ACTIVE
 ─────────────────────────────────────────────────────────────
  1   wire.py::wire_push_stream              100%   5 / 5
      tests/…::test_t7_push_stream   [test]
      ████████████████████████████████████
 ─────────────────────────────────────────────────────────────
  2   repos.py::fork_repo                     80%   4 / 5
      models.py::ForkRepoRequest
      ████████████████████████████░░░░░░░░

Each pair row links to the per-symbol focus view.


Phase 4 — Per-Symbol Focus Page

Route: GET /{owner}/{repo}/intel/entangle/symbol?address=… File: musehub/api/routes/musehub/ui_intel.py

Shows all pairs involving a single symbol, including structurally-linked ones (mirrors muse code entangle --symbol ADDRESS).

Header: symbol address, file, kind badge, total pair count. Body: same pair-row list but scoped to this symbol as either A or B. Back link: ← All entangled pairs


Phase 5 — Intel Dashboard Card

Add entangle card to intel_dashboard.html alongside hotspots / dead / blast / stable.

Card content:

  • Title: ⚡ ENTANGLE
  • Count: N pairs in accent color
  • Top 3 pairs with rate badge
  • "View all →" link

Update .intel-cards grid in _intel.scss if column count changes.


Phase 6 — Tests (Seven Tiers)

File: tests/test_entangle.py

Test strings — canonical fixtures

SYMBOL_A = "musehub/services/musehub_wire.py::wire_push_stream"
SYMBOL_B = "tests/test_wire_push_stream.py::test_t7_push_stream_response_contains_result_frame"
SYMBOL_C = "musehub/api/routes/musehub/repos.py::fork_repo"
SYMBOL_D = "musehub/models/musehub.py::ForkRepoRequest"
SYMBOL_IMPORT = "musehub/services/musehub_wire.py::import::collections"  # must be excluded
SYMBOL_MASS = "musehub/db/musehub_models.py::Base"                        # mass-refactor anchor

FILE_A = "musehub/services/musehub_wire.py"
FILE_B = "tests/test_wire_push_stream.py"
FILE_C = "musehub/api/routes/musehub/repos.py"
FILE_D = "musehub/models/musehub.py"

Tier 1 — Unit

ET_U_01  pair_key() returns canonical sorted tuple regardless of A/B order
ET_U_02  rate = co_changes / commits_both_active, rounds to 2dp
ET_U_03  import pseudo-symbol excluded: address containing "::import::" skipped
ET_U_04  same_file detection: FILE_A == FILE_A → True, FILE_A == FILE_B → False
ET_U_05  a_in_test: "test" in file path → True for FILE_B, False for FILE_A
ET_U_06  mass-commit skip: commit touching 501 symbols is excluded from analysis
ET_U_07  min_co_changes=2 filter: pair with co_changes=1 excluded
ET_U_08  min_rate filter: pair with rate=0.3 excluded when min_rate=0.5
ET_U_09  truncation flag: pairs > 500 sets truncated=True
ET_U_10  structurally_linked=True pair excluded from default list, included with include_linked=True

Tier 2 — Integration

ET_I_01  seed 3 commits each touching SYMBOL_A + SYMBOL_B → co_changes=3, rate=1.0
ET_I_02  seed 5 commits touching SYMBOL_A, only 3 touching SYMBOL_B → rate=3/5=0.6
ET_I_03  import symbol SYMBOL_IMPORT never appears in output even when it co-changes
ET_I_04  same-file pair excluded by default, present when include_same_file=True
ET_I_05  structurally_linked pair excluded by default, present when include_linked=True
ET_I_06  mass-commit (501 symbols) excluded; provider count reflects reduced pair set
ET_I_07  upsert idempotency: running compute() twice produces identical rows, no duplicates
ET_I_08  stale pair deletion: pair present in run 1, not in run 2 (rate dropped), absent after run 2
ET_I_09  provider returns ("intel.code.entangle", {"count": N, ...}) with correct N
ET_I_10  a_in_test and b_in_test flags stored correctly in DB for FILE_B containing "test"

Tier 3 — End-to-End

ET_E_01  GET /gabriel/musehub/intel/entangle returns 200
ET_E_02  pair row contains SYMBOL_A truncated address and ↔ arrow
ET_E_03  [test] badge present when b_in_test=True
ET_E_04  rate bar width reflects co_change_rate (style attribute contains correct %)
ET_E_05  min_rate=0.9 filter: only 100% pairs shown
ET_E_06  include_linked=true: structurally-linked pairs appear
ET_E_07  /intel/entangle/symbol?address=SYMBOL_A returns 200, shows only pairs involving SYMBOL_A
ET_E_08  GET /gabriel/musehub/intel/entangle with no pairs returns empty-state markup
ET_E_09  intel dashboard card shows ⚡ ENTANGLE and non-zero count after provider runs
ET_E_10  "View all →" link on dashboard card href matches /intel/entangle

Tier 4 — Stress

ET_S_01  10,000 commits each touching 2 symbols → provider completes in < 30s
ET_S_02  repo with 500 symbols all co-changing → exactly 500 pairs stored (truncated=True)
ET_S_03  concurrent provider runs for 5 repos → no cross-repo contamination in musehub_intel_entangle
ET_S_04  single commit touching MAX_SYMBOLS_PER_COMMIT (500) symbols exactly → included
         single commit touching MAX_SYMBOLS_PER_COMMIT + 1 (501) → excluded
ET_S_05  list page with 500 pairs renders < 500ms (template benchmark)

Tier 5 — State

ET_ST_01  provider run 1 writes N pairs; run 2 with fewer commits writes M < N pairs;
          rows from run 1 not in run 2 are deleted (no stale rows)
ET_ST_02  ref column updated on every run; page reads new ref from DB
ET_ST_03  empty repo (0 commits) → provider returns [] and writes 0 rows
ET_ST_04  repo with commits but no symbol history entries → 0 pairs, no crash
ET_ST_05  pair that was A↔B in run 1 appears as A↔B in run 2 (canonical key ordering preserved)

Tier 6 — Integrity

ET_IN_01  co_change_rate stored as float; retrieved value matches inserted value within 1e-6
ET_IN_02  (repo_id, symbol_a, symbol_b) PK prevents duplicate pairs for same repo+run
ET_IN_03  ON DELETE CASCADE: deleting repo removes all entangle rows for that repo
ET_IN_04  symbol_a and symbol_b are always canonically ordered (a <= b lexicographically)
          so the pair billing.py::X ↔ auth.py::Y is stored as auth.py::Y ↔ billing.py::X
ET_IN_05  truncated flag in intel result matches whether stored pairs == 500
ET_IN_06  commits_both_active >= co_changes for every stored row (rate ≤ 1.0 invariant)

Tier 7 — Performance

ET_P_01  provider with 1,000 commits and 10,000 history entries completes in < 10s
ET_P_02  list page query (ORDER BY co_change_rate DESC, LIMIT 50) uses index scan,
         not seq scan (EXPLAIN ANALYZE)
ET_P_03  per-symbol focus query (WHERE symbol_a=? OR symbol_b=?) uses index, not seq scan
ET_P_04  bulk INSERT (500 rows) uses executemany / COPY-style batch, not row-at-a-time loop
ET_P_05  provider memory usage stays < 256 MB for repos with 10k symbols × 1k commits

Tier 8 — Security

ET_SEC_01  address param on /intel/entangle/symbol?address= is length-capped at 512 chars;
           value > 512 returns 400
ET_SEC_02  address containing SQL metacharacters (' OR 1=1 --) treated as literal string,
           no injection
ET_SEC_03  address containing XSS payload (<script>alert(1)</script>) is HTML-escaped in template
ET_SEC_04  unauthenticated GET returns 200 (public repo) or 403 (private repo) — not 500
ET_SEC_05  min_rate param with value "1; DROP TABLE musehub_intel_entangle" returns 422,
           not 500 or silent data loss
ET_SEC_06  page param with value -1 or 0 is clamped to 1, not passed to SQL OFFSET raw

Acceptance Criteria

  • Migration 0010_entangle_fields.py adds all six columns and four indexes
  • MusehubIntelEntangle model updated with new mapped columns
  • EntangleProvider.compute() registered and enqueued on push for code-domain repos
  • Provider output matches muse code entangle --json field-for-field
  • /intel/entangle list page renders pairs with rate bar, co-change count, and badges
  • /intel/entangle/symbol?address=… per-symbol focus view works
  • Intel dashboard card shows entangle pair count
  • All 8 testing tiers pass with the canonical test strings above
  • SCSS split: zero structural rules in components/, zero visual rules in pages/
  • No border-left tier indicators (learned from stable symbols)
  • Numbers formatted with fmtnum throughout
  • Deploy to staging, push to staging main + dev, close this issue
Activity2
gabriel opened this issue 47 days ago
gabriel 47 days ago

Status Update

✅ Phase 0 — Migration (complete)

  • alembic/versions/0010_entangle_fields.py — adds commits_both_active, file_a, file_b, same_file, a_in_test, b_in_test to musehub_intel_entangle
  • Added four indexes: (repo_id, file_a), (repo_id, co_change_rate), (repo_id, symbol_a), (repo_id, symbol_b)
  • MusehubIntelEntangle model updated

✅ Phase 1 — EntangleProvider (complete)

  • Rewrote subprocess _run_muse call as pure BFS + SQL — no local repo required
  • BFS-walks musehub_commits from HEAD (same pattern as StableProvider)
  • Bulk-fetches musehub_symbol_history_entries; groups by commit
  • Excludes bare paths (no ::), import pseudo-symbols, mass-refactor commits (> 500 symbols)
  • co_change_rate = co_changes / min(|commits_A|, |commits_B|) — matches CLI exactly
  • Top pairs verified field-for-field against muse code entangle --json on local repo
  • Commit: sha256:41b239182ca3

🔄 Phase 2 — SCSS (in progress)

src/scss/components/_entangle.scss and src/scss/pages/_entangle.scss

⏳ Phases 3–6 pending

gabriel 47 days ago

✅ Issue #13 complete — shipped to staging

All 5 phases landed on main @ sha256:cedbb6f8 and deployed to staging.

Phases delivered

Phase Description Commit
0 Migration 0010 — 6 new columns + 4 indexes on musehub_intel_entangle 41b23918
1 Pure-SQL EntangleProvider — BFS walk, Jaccard-min rate, import/mass-commit filters 41b23918
2 SCSS — components/_entangle.scss + pages/_entangle.scss, wired into app.scss 5cdc094b
3 Routes + templates — /intel/entangle list + /intel/entangle/symbol focus 413600d4
4 Dashboard card — 5th card on intel hub, 5-col grid, pair count + preview 9c8e1565
5 Test suite — 54 cases (ET_01–ET_54) across 8 tiers 9c8e1565

Polish

  • Inline pair row: symbol_a ↔ symbol_b on one line (matches CLI output style)
  • Header updated to intel-page-header pattern matching stable/dead/hotspots
  • Data parity verified: GUI rate values match muse code entangle --json exactly (all 100% pairs on this repo are genuinely 100% — atomic feature commits)