gabriel / musehub public
Closed #21 Enhancement
filed by gabriel human · 48 days ago

feat(intel): Code Map — structural dependency topology page

0 Anchors
Blast radius
Churn 30d
0 Proposals

Multi-Dimensional Code Intelligence Findings

Before designing this feature, muse code was run across seven axes to understand the codebase topology.

muse code codemap — current output shape

modules:          239 files  (top: docs/reference/type-contracts.md sym=626)
import_cycles:    0          (clean — no circular imports detected)
high_centrality:  get=1819   commit=1306  execute=612  select=457  (stdlib + SA dominate)
boundary_files:   musehub/db/__init__.py (fan_out=7), repos.py (fan_out=6)
agent_safe_zones: 15 files   (low-churn migration files)

muse code gravity — blast-radius leaders

musehub/storage/backends.py::LocalBackend._path        gravity_pct=38.5
musehub/services/musehub_wire.py::wire_push_stream     (extreme centrality)
musehub/db/__init__.py                                 (fan_out=7 → imported by all routes)

muse code hotspots — churn leaders

musehub/services/musehub_wire.py::wire_push_stream
musehub/mcp/dispatcher.py::_call_tool
musehub/storage/backends.py::LocalBackend

muse code entangle — always-change-together pairs

musehub/services/musehub_wire.py  ↔  tests/test_wire_push_stream.py  (co_change_rate=1.0)

muse code velocity

musehub/api/routes/musehub/  — highest active_commits (route layer most volatile)
musehub/services/            — second highest
tests/                       — co-moves with services

muse code dead — 29 high-confidence dead-code candidates (known, not blocking)

Import-record anatomy (from parse_symbols)

# parse_symbols returns import records with qualified_name:
# 'musehub/api/repos.py::import::get_db'
#   qualified_name = 'import::musehub.db::get_db'
#   → module = 'musehub.db'
#   → file   = 'musehub/db/__init__.py'  (resolve via manifest)
#
# This lets us compute fan_in / fan_out at push time — zero subprocess needed.

Architecture: No Subprocess. Ever.

Three prior agents called _run_muse subprocess for codemap. Do not do this.

The correct pattern (proven by ApiSurfaceProvider, TypeProvider, LanguagesProvider):

Push event
  │
  ├─ manifest_blob (msgpack path→object_id)
  ├─ get_backend(owner, slug).get(object_id)  ← raw file bytes from storage
  ├─ parse_symbols(src, path)                  ← pure AST, no I/O
  ├─ language_of(path)                         ← pure dict lookup
  │
  ├─ Build import graph: qualified_name → module → file (resolved via manifest)
  ├─ Compute fan_in / fan_out per file
  ├─ Detect cycles (DFS on import graph)
  │
  └─ Batch-upsert → musehub_intel_codemap_modules
     Upsert       → musehub_intel_codemap_meta

Patch targets (module-level imports, patchable in tests):

from musehub.storage.backends     import get_backend
from muse.plugins.code.ast_parser import parse_symbols
from muse.plugins.code._query     import language_of

Web GUI — ASCII Mockup

┌──────────────────────────────────────────────────────────────────────────────┐
│  ← Intel Hub                                                                  │
│  ◈ Code Map                                                                   │
│  Structural dependency topology — modules, import edges, and cycle detection. │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                                │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │     239      │  │    1 ,842    │  │      7       │  │      0       │     │
│  │   MODULES    │  │    EDGES     │  │  LANGUAGES   │  │    CYCLES    │     │
│  └──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘     │
│   (spectral grad)   (--color-teal)    (--color-purple)  (--color-success)    │
│                                                                                │
│  Sort  [symbols ▼] [fan-in] [fan-out]   Show  [20] [50] [100]                │
│                                                                                │
│  ┌─────────────────────────────────────────────────────────────────────────┐  │
│  │  FILE                             SYM   FAN-IN        FAN-OUT  LANG    │  │
│  ├─────────────────────────────────────────────────────────────────────────┤  │
│  │  musehub/db/__init__.py            48   ████████ 12      7    Python   │  │
│  │  musehub/models/musehub.py        181   █████    7       0    Python   │  │
│  │  musehub/services/musehub_wire.py  93   ████     5       4    Python   │  │
│  │  musehub/api/routes/musehub/       38   ███      4       6    Python   │  │
│  │  musehub/types/json_types.py       12   ██       3       2    Python   │  │
│  │  src/scss/app.scss                  0   ─        0       3    SCSS     │  │
│  │  ...                                                                   │  │
│  └─────────────────────────────────────────────────────────────────────────┘  │
│                                                                                │
│  ┌─────────────────────────────────────────────────────────────────────────┐  │
│  │  ✓ No import cycles detected                                            │  │
│  └─────────────────────────────────────────────────────────────────────────┘  │
│                                                                                │
│  ┌─────────────────────────────────────────────────────────────────────────┐  │
│  │  ⬡ Most-imported modules (fan-in leaders)                               │  │
│  │  ┌──────────────────────────────────────────────────────────┐          │  │
│  │  │  musehub/db/__init__.py      ████████████████████  12    │          │  │
│  │  │  musehub/models/musehub.py   ██████████████        7     │          │  │
│  │  │  musehub/types/json_types.py ████████              4     │          │  │
│  │  └──────────────────────────────────────────────────────────┘          │  │
│  └─────────────────────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────────────────┘

Dashboard card (intel hub):

┌────────────────────────────────┐
│  ◈ CODE MAP         View all → │
│  239 modules · 1,842 edges     │
│  ┌──────────────────────────┐  │
│  │ musehub/db/__init__  ██12│  │
│  │ musehub/models       ██7 │  │
│  │ musehub/services     ██5 │  │
│  └──────────────────────────┘  │
│  ✓ 0 import cycles             │
└────────────────────────────────┘

Spectral Theme Tokens

Element Token
Page title icon var(--color-accent)
Modules stat chip var(--gradient-spectral) val
Edges stat chip var(--color-teal) border
Languages stat chip var(--color-purple) border
Cycles chip (clean) var(--color-success) border
Cycles chip (dirty) var(--color-danger) border
Fan-in bars var(--gradient-spectral)
Fan-out bars var(--color-teal) solid
Language badges Reuse .ln-kind-chip tokens
Subpage header title .intel-subhd-title--spectral
Module name var(--font-mono) primary
Count values var(--font-mono) secondary

DB Schema

Migration 0015

Table 1: musehub_intel_codemap_modules

CREATE TABLE musehub_intel_codemap_modules (
    repo_id       VARCHAR(128) REFERENCES musehub_repos(repo_id) ON DELETE CASCADE,
    file_path     VARCHAR(512),
    language      VARCHAR(128) NOT NULL DEFAULT '',
    symbol_count  INTEGER      NOT NULL DEFAULT 0,
    import_count  INTEGER      NOT NULL DEFAULT 0,
    fan_in        INTEGER      NOT NULL DEFAULT 0,  -- files that import this
    fan_out       INTEGER      NOT NULL DEFAULT 0,  -- files this imports
    ref           VARCHAR(128) NOT NULL,
    PRIMARY KEY (repo_id, file_path)
);
CREATE INDEX ix_intel_codemap_modules_repo ON musehub_intel_codemap_modules (repo_id);

Table 2: musehub_intel_codemap_meta

CREATE TABLE musehub_intel_codemap_meta (
    repo_id         VARCHAR(128) PRIMARY KEY REFERENCES musehub_repos(repo_id) ON DELETE CASCADE,
    total_modules   INTEGER NOT NULL DEFAULT 0,
    total_edges     INTEGER NOT NULL DEFAULT 0,  -- sum of all fan_out values
    language_count  INTEGER NOT NULL DEFAULT 0,
    cycle_count     INTEGER NOT NULL DEFAULT 0,
    cycles_json     JSONB,  -- list of cycle lists [[file_a, file_b, file_c], ...]
    ref             VARCHAR(128) NOT NULL
);

Provider Algorithm

class CodemapProvider:
    """
    Persist structural topology derived from stored snapshot objects.

    ┌────────────────────────────────────────────────────────────┐
    │  Snapshot manifest (path → object_id)                      │
    │        │                                                    │
    │        ▼                                                    │
    │  per file: parse_symbols() → count syms + build imports    │
    │        │                                                    │
    │        ▼                                                    │
    │  import qualified_name → module path → resolve to file     │
    │        │                                                    │
    │        ▼                                                    │
    │  Build adjacency dict: file → {imported_files}             │
    │        │                                                    │
    │        ├─▶ fan_out[file]  = len(imported_files)            │
    │        ├─▶ fan_in[file]   = count of files that import it  │
    │        └─▶ DFS cycle detection                             │
    │                                                             │
    │  Batch-upsert modules (1000-row chunks)                    │
    │  Upsert meta (1 row)                                       │
    └────────────────────────────────────────────────────────────┘
    """

    _CHUNK = 1_000
    _IMPORT_KIND = "import"

    def _module_path_to_file(
        self,
        module: str,                 # e.g. "musehub.db"
        manifest: dict[str, str],    # path → object_id
    ) -> str | None:
        """
        Resolve a Python dotted-module name to a repo-relative file path.

        Tries __init__.py first, then .py.  Returns None when the module
        cannot be resolved to a tracked file (stdlib, third-party, etc.).
        """

    def _detect_cycles(
        self,
        graph: dict[str, set[str]],  # file → {imported_files}
    ) -> list[list[str]]:
        """
        Tarjan's SCC algorithm to find all strongly-connected components
        with more than one node (= import cycles).
        """

Seven-Tier Test Plan

T01–T05   Tier 1 — Unit (DB models, nullable fields, PKs, cascade, meta table)
T06–T11   Tier 2 — Integration / Provider
              T06  no subprocess (_run_muse absent from source)
              T07  symbol_count excludes imports
              T08  fan_out computed from resolved import graph
              T09  fan_in counted correctly across files
              T10  cycle detection finds 2-file cycle
              T11  empty manifest returns []
T12–T19   Tier 3 — Route (200, empty state, 404, sort=symbols/fan-in/fan-out,
                           top=20/50/100, 422 on bad top, meta row present)
T20–T23   Tier 4 — E2E HTML (fan-in bars render, cycle panel ✓/⚠, dashboard link,
                               language badges, fmtnum on all counts)
T24–T26   Tier 5 — Data integrity (double-upsert single row, overwrite fan_in,
                                    cross-repo isolation)
T27–T29   Tier 6 — Performance (provider <3s/200 files, route <200ms/5k modules,
                                  DB index on repo_id used)
T30–T32   Tier 7 — Security (XSS in file_path escaped, SQL injection in sort param,
                               no 500 on non-existent repo)

Total: 32 tests across 7 tiers. All written RED before Phase 4 begins.


Implementation Phases (load-bearing order)

Phase 1 — DB Migration + ORM Models

Files: alembic/versions/0015_codemap_tables.py, musehub/db/musehub_models.py

  • Migration 0015: create musehub_intel_codemap_modules + musehub_intel_codemap_meta
  • Add MusehubIntelCodemapModule and MusehubIntelCodemapMeta ORM models with full NumPy-style docstrings
  • Index: ix_intel_codemap_modules_repo on (repo_id)
  • Cascade delete from musehub_repos

Phase 2 — CodemapProvider (pure Python, zero subprocess)

File: musehub/services/musehub_intel_providers.py

  • Implement CodemapProvider.compute():
    • Walk manifest via get_backend(owner, slug).get(object_id)
    • Call parse_symbols(src, path) per file
    • Derive import graph: qualified_name → module path → resolve to file in manifest
    • Compute fan_out (per file) and fan_in (post-processing pass)
    • Detect cycles via Tarjan's SCC
    • Batch-upsert modules (1000-row chunks)
    • Upsert meta row
  • Register as "intel.code.codemap" in _PROVIDER_REGISTRY
  • Add to get_intel_job_types() list
  • Module-level imports: get_backend, parse_symbols, language_of (required for test patchability)

Phase 3 — TDD: 32 Tests RED

File: tests/test_intel_codemap.py

All 32 tests written and confirmed failing (route returns 404, template does not exist) before Phase 4 begins.

Phase 4 — Route Handler

File: musehub/api/routes/musehub/ui_intel.py

  • Add intel_codemap_page at GET /{owner}/{repo_slug}/intel/codemap
  • Query musehub_intel_codemap_modules (sorted, limited) + musehub_intel_codemap_meta
  • Sort params: symbols (default) | fan-in | fan-out; coerce unknown → symbols
  • Top params: 20 (default) | 50 | 100; coerce invalid → 20
  • Bar width: fan_in normalised against max_fan_in in visible set
  • Context: modules, meta, total_modules, total_edges, cycle_count, cycles, selected_sort, selected_top, valid_tops, valid_sorts
  • Add to module docstring

Phase 5 — Template

File: musehub/templates/musehub/pages/intel_codemap.html

  • Extends musehub/base.html, uses intel-wrap
  • Stat chips row: Modules / Edges / Languages / Cycles (green ✓ or red ⚠)
  • Sort + top filter bar (reuse intel-filter-pill pattern)
  • Module list: cm-list-hd + cm-row grid — file | symbols | fan-in bar | fan-out | language badge
  • Cycle panel: green ✓ when cycle_count == 0, else red ⚠ with cycle paths listed
  • Fan-in leaders panel: top-5 most-imported modules with spectral bars
  • Empty state: code icon + "Push a commit to populate the code map."

Phase 6 — SCSS (.cm- namespace)

Files: src/scss/components/_codemap.scss, src/scss/pages/_codemap.scss

_codemap.scss (visual):

  • .cm-stat-card with per-variant border tints (teal=edges, purple=languages, success/danger=cycles)
  • .cm-stat-val with var(--gradient-spectral) background-clip text
  • .cm-stat-lbl uppercase muted label
  • .cm-row hover: var(--bg-surface)
  • .cm-fan-barvar(--gradient-spectral) fill
  • .cm-fan-out-barvar(--color-teal) fill
  • .cm-cycle-okvar(--color-success) icon + text
  • .cm-cycle-warnvar(--color-danger) icon + text
  • .cm-lang-badge — reuse .ln-kind-chip visual, no size change

_codemap.scss (layout):

  • .cm-stats-row: flex, gap, wrap
  • .cm-list: margin-top
  • .cm-list-hd + .cm-row: grid-template-columns: 1fr 4rem 10rem 5rem 7rem (file | sym | fan-in bar | fan-out | lang)
  • Responsive: collapse fan-out + lang at 700px
  • .cm-bar-cell: flex, items center, gap
  • .cm-leaders-panel: margin-top, border, padding

Wire into app.scss after @use "components/languages" and @use "pages/languages".

Phase 7 — Dashboard Card + Wire-Up

Files: musehub/api/routes/musehub/ui_intel.py, musehub/templates/musehub/pages/intel_dashboard.html

  • Dashboard query: codemap_meta (1 row) + codemap_modules_preview (top 5 by fan_in)
  • Context keys: codemap_total_modules, codemap_total_edges, codemap_cycle_count, codemap_preview
  • Dashboard card: ◈ CODE MAP icon (accent), module + edge counts, top-3 fan-in bars, cycle status chip

Docstring Standard

Every provider, model, and route function must carry a NumPy-style docstring. Minimum example:

class CodemapProvider:
    """Persist structural dependency topology derived from stored snapshot objects.

    Reads the HEAD snapshot manifest, calls ``parse_symbols()`` per file to
    extract import relationships, resolves dotted module names back to tracked
    file paths, computes ``fan_in`` / ``fan_out`` per module, runs Tarjan's SCC
    for cycle detection, and batch-upserts results into
    ``musehub_intel_codemap_modules`` and ``musehub_intel_codemap_meta``.

    No subprocess is ever spawned.  All data flows from objects stored at push
    time via ``get_backend(owner, slug).get(object_id)``.

    Parameters
    ----------
    session : AsyncSession
    repo_id : str
    ref : str
    payload : JSONObject

    Returns
    -------
    IntelResults
        ``[("intel.code.codemap", {"modules": N, "edges": E, "cycles": C})]``
        on success.  ``[]`` when the snapshot manifest cannot be resolved.

    Notes
    -----
    Fan-in is computed in a post-processing pass after the full manifest is
    walked: for each resolved import edge ``A → B``, ``fan_in[B] += 1``.
    Stdlib and third-party imports that cannot be resolved to a tracked file
    path are silently skipped — they inflate ``fan_out`` only when the
    resolved file is actually in the manifest.
    Tarjan's algorithm runs in O(V + E) on the resolved import graph; for a
    typical repo of 300 files and 1,800 edges this completes in < 10 ms.
    """

Acceptance Criteria

  • All 32 tests GREEN
  • https://staging.musehub.ai/gabriel/musehub/intel/codemap returns 200 with real data
  • Modules stat chip shows total (not page-length) — uses DB COUNT aggregate
  • All numbers pass through | fmtnum filter
  • No _run_muse / create_subprocess_exec in CodemapProvider.compute
  • Cycles panel shows ✓ green for musehub (0 cycles confirmed by CLI output)
  • Fan-in bars use var(--gradient-spectral)
  • Dashboard card links to /intel/codemap
  • Deployed to staging, issue #8 updated
Activity1
gabriel opened this issue 48 days ago
gabriel 48 days ago

32/32 tests GREEN. Deployed to staging as image 82ee0f34-20260503144856. All acceptance criteria met — closing.