gabriel / musehub public
Closed #20 Enhancement
filed by gabriel human · 48 days ago

feat: intel/languages GUI page — language breakdown web dashboard

0 Anchors
Blast radius
Churn 30d
0 Proposals

Summary

Port `muse code languages` into the Intel Hub as a fully interactive web page. No subprocess at any point — background worker on every `muse push` aggregates language stats from stored snapshot objects using pure-Python helpers; a SQL query drives the UI.

The DB model (`MusehubIntelLanguages`) and provider skeleton (`LanguagesProvider`) already exist but both have critical architectural violations that must be fixed before the GUI can be built.


The Architecture — Read This First

Every intel provider in this codebase follows one pattern:

muse push received
       │
       ▼
worker: fetch snapshot manifest from musehub_snapshots.manifest_blob (msgpack)
       │
       ▼
worker: for each file path → object_id in manifest:
         src = await get_backend(owner, slug).get(object_id)   ← bytes from object store
         language = language_of(file_path)                     ← pure Python, zero I/O
         tree = parse_symbols(src, file_path)                  ← pure AST, zero subprocess
       │
       ▼
worker: aggregate counts → batch-upsert to musehub_intel_languages
       │
       ▼
HTTP request: SELECT FROM musehub_intel_languages WHERE repo_id = $1
       │
       ▼
template rendered — zero muse subprocesses, ever

`language_of` is a pure Python function — `muse.plugins.code._query.language_of` — that maps file extension to display language name via a dict lookup. `parse_symbols` is a pure AST parser — `muse.plugins.code.ast_parser.parse_symbols`. Neither touches disk, forks a process, or needs a repo checkout.

This is identical to TypeProvider and ApiSurfaceProvider. Copy that pattern exactly.


Current State

Layer Status
DB table `musehub_intel_languages` ✅ exists — missing `kinds_json` JSONB column
ORM model `MusehubIntelLanguages` ✅ exists — missing `kinds_json` field
`LanguagesProvider.compute()` ❌ uses `_run_muse` subprocess + row-by-row upserts
Route `intel_languages_page` ❌ missing
Template `intel_languages.html` ❌ missing
SCSS `pages/_languages.scss` + `components/_languages.scss` ❌ missing
Dashboard card on `intel_dashboard.html` ❌ missing
Test suite `tests/test_intel_languages.py` ❌ missing

Data Shape — `muse code languages --json` output

{
  "languages": [
    {
      "language": "Python",
      "file_count": 451,
      "symbol_count": 10869,
      "kinds": {
        "variable": 1092,
        "function": 1346,
        "async_function": 2305,
        "class": 1447,
        "method": 2214,
        "async_method": 2465
      }
    },
    {
      "language": "Markdown",
      "file_count": 29,
      "symbol_count": 1488,
      "kinds": { "section": 1348, "variable": 140 }
    },
    {
      "language": "HTML",
      "file_count": 106,
      "symbol_count": 129,
      "kinds": { "section": 129 }
    },
    {
      "language": "TOML",
      "file_count": 1,
      "symbol_count": 52,
      "kinds": { "section": 17, "variable": 35 }
    }
  ]
}

The `kinds` dict is the richest field — it breaks down symbol composition inside each language. It is NOT currently stored in the DB; Phase 1 adds it.


Page Design

┌───────────────────────────────────────────────────────────────────────────────────┐
│  ← Intel Hub    ◈ Languages                                                       │
│  Code composition by programming language — files, symbols, and kind breakdown.   │
├───────────────────────────────────────────────────────────────────────────────────┤
│                                                                                   │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐         │
│  │     21       │  │   12,544     │  │     588      │  │      5       │         │
│  │  Languages   │  │   Symbols    │  │    Files     │  │  With Code   │         │
│  └──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘         │
│                                                                                   │
│  Sort: [ Name ▼ ] [ Files ] [ Symbols ]                                           │
│                                                                                   │
│ ┌──────────────────────────────────────────────────────────────────────────────┐  │
│ │  LANGUAGE            FILES    SYMBOLS    COMPOSITION                         │  │
│ ├──────────────────────────────────────────────────────────────────────────────┤  │
│ │                                                                               │  │
│ │  Python              451     10,869   ████████████████████████░  86.7%      │  │
│ │  ╰─ fn: 1,346  fn~: 2,305  class: 1,447  method: 2,214  method~: 2,465     │  │
│ │     var: 1,092                                                                │  │
│ │                                                                               │  │
│ │  Markdown             29      1,488   ███░░░░░░░░░░░░░░░░░░░░░░  11.9%      │  │
│ │  ╰─ section: 1,348  var: 140                                                  │  │
│ │                                                                               │  │
│ │  HTML                106        129   █░░░░░░░░░░░░░░░░░░░░░░░░   1.0%      │  │
│ │  ╰─ section: 129                                                               │  │
│ │                                                                               │  │
│ │  TOML                  1         52   ░░░░░░░░░░░░░░░░░░░░░░░░░   0.4%      │  │
│ │  ╰─ section: 17  var: 35                                                      │  │
│ │                                                                               │  │
│ │  (no ext)              6          0   ░░░░░░░░░░░░░░░░░░░░░░░░░   0.0%      │  │
│ │                                                                               │  │
│ └──────────────────────────────────────────────────────────────────────────────┘  │
│                                                                                   │
│  Sort: name ▼  |  files  |  symbols                                               │
└───────────────────────────────────────────────────────────────────────────────────┘

Show all languages (including those with 0 symbols) — they tell you about file types present in the repo (assets, config, docs). Filter by "With code only" to focus on symbolic languages.


Implementation Plan — Load-Bearing Order

Phase 1 — DB schema: add `kinds_json` column

File: Alembic migration (next version after current head) File: `musehub/db/musehub_models.py`

Add `kinds_json` JSONB column to `musehub_intel_languages`:

# musehub_models.py addition to MusehubIntelLanguages:
kinds_json: Mapped[dict | None] = mapped_column(
    JSON, nullable=True, default=None,
    comment="kind → count breakdown, e.g. {'function': 1346, 'class': 1447}",
)

Migration stub:

# alembic/versions/XXXX_add_kinds_json_to_intel_languages.py
def upgrade():
    op.add_column(
        "musehub_intel_languages",
        sa.Column("kinds_json", postgresql.JSONB(astext_type=sa.Text()), nullable=True),
    )

def downgrade():
    op.drop_column("musehub_intel_languages", "kinds_json")

Why load-bearing first: provider (Phase 2) writes this column; all tests assume it exists.


Phase 2 — Provider rewrite: `LanguagesProvider`

File: `musehub/services/musehub_intel_providers.py`

Replace the subprocess + row-by-row pattern entirely. The new provider:

  1. Reads the snapshot manifest from `musehub_snapshots.manifest_blob`
  2. Calls `language_of(file_path)` — a pure dict lookup, no I/O
  3. Fetches file bytes via `get_backend(owner, slug).get(object_id)`
  4. Calls `parse_symbols(src, file_path)` for each file — pure AST
  5. Aggregates `file_count`, `symbol_count`, `kinds` per language
  6. Computes `pct` = symbol_count / total_symbols * 100 (0.0 if no symbols)
  7. Batch-upserts all rows in a single chunked call
from muse.plugins.code._query import language_of
from muse.plugins.code.ast_parser import parse_symbols
from musehub.storage.backends import get_backend

class LanguagesProvider:
    """Persist language composition derived from stored snapshot objects.

    Reads the HEAD snapshot manifest from ``musehub_snapshots.manifest_blob``
    (msgpack dict of path → object_id), calls ``language_of()`` (pure extension
    map — no subprocess, no I/O) to group files by language, fetches each file's
    bytes from the storage backend, and runs ``parse_symbols`` (pure AST) to
    count symbols and kind breakdowns per language.  Aggregated results are
    batch-upserted into ``musehub_intel_languages`` in a single SQL statement
    (language count is always small — typically < 20 rows).

    This follows the same DB-only pattern as ``TypeProvider`` and
    ``ApiSurfaceProvider``: all data flows from objects stored at push time;
    nothing requires a working muse repository on disk.  The subprocess-based
    ``_run_muse`` call is explicitly prohibited.

    Parameters
    ----------
    session : AsyncSession
        Active async SQLAlchemy session provided by the push worker.
    repo_id : str
        Opaque repo identifier from ``musehub_repos.repo_id``.
    ref : str
        Branch or commit ref string at push time (e.g. ``"dev"``).
    payload : JSONObject
        Raw push-event JSON — used to resolve owner/slug for the storage
        backend when not already present in the session.

    Returns
    -------
    IntelResults
        ``[("intel.code.languages", {"count": N})]`` on success where N is
        the number of distinct languages found.  ``[]`` when the snapshot
        manifest cannot be resolved or the repo has no tracked files.

    Notes
    -----
    Import pseudo-symbols (``kind == "import"``) are excluded from counts to
    match ``muse code languages`` default behaviour.  \`pct\` is computed over
    total symbol count across all languages (0.0 when no symbols exist).
    """

    _CHUNK = 1_000
    _IMPORT_KINDS: frozenset[str] = frozenset({"import"})

    async def compute(
        self, session: AsyncSession, repo_id: str, ref: str, payload: JSONObject
    ) -> IntelResults:
        import msgpack

        # 1. Resolve owner/slug
        # 2. Fetch HEAD snapshot manifest  (same boilerplate as TypeProvider)
        # 3. Walk manifest: language_of(path) → file counts (pure dict lookup)
        # 4. Fetch bytes + parse_symbols → kind counts per language
        # 5. Compute pct
        # 6. Batch-upsert in one shot (< 20 rows normally)

Critical imports to add at module level (for patchability in tests):

from muse.plugins.code._query import language_of as _language_of
from muse.plugins.code.ast_parser import parse_symbols as _parse_symbols

Phase 3 — TDD: `tests/test_intel_languages.py` (all 30 tests RED first)

Write ALL tests before touching the route or template. Tests must fail (404 / import errors / assertion errors) before Phase 4 exists.

Pattern: mirror `tests/test_intel_api_surface.py` and `tests/test_intel_type.py`.

Tier 1 — Unit (T01–T04)
T01  MusehubIntelLanguages row insert + read-back (repo_id + language PK)
T02  kinds_json field stores and retrieves dict correctly
T03  on_conflict_do_update: upsert overwrites symbol_count/file_count/kinds_json/ref
T04  cascade delete: dropping repo purges all language rows
Tier 2 — Integration / Provider (T05–T09)
T05  LanguagesProvider.compute() reads snapshot manifest — no _run_muse call ever
T06  3-language snapshot: correct file_count and symbol_count per language
T07  kinds_json populated correctly: Python entry has function/class/method keys
T08  pct computed correctly: sums to ~100.0 across all languages with symbols
T09  Empty manifest returns [] without touching musehub_intel_languages
Tier 3 — Route (T10–T17)
T10  GET /{owner}/{repo}/intel/languages → 200 with repo that has data
T11  GET /{owner}/{repo}/intel/languages → empty state for repo with no data (no 500)
T12  Repo not found → 404
T13  sort=symbols: rows ordered by symbol_count DESC
T14  sort=files: rows ordered by file_count DESC
T15  sort=name: rows ordered alphabetically
T16  show=all: languages with 0 symbols included
T17  show=code: only languages with symbol_count > 0 returned
Tier 4 — E2E HTML (T18–T21)
T18  Stat chip "Languages" shows correct total count
T19  Stat chip "Symbols" shows sum of all symbol counts
T20  Bar width attribute reflects pct value from DB
T21  kinds_json chips rendered per language row
Tier 5 — Data Integrity (T22–T24)
T22  Two sequential upserts produce exactly N rows (no duplicates per language)
T23  After language renamed across pushes: old name row updated, not duplicated
T24  Cross-repo isolation: repo A's language rows never visible from repo B
Tier 6 — Performance (T25–T27)
T25  Provider processes 50-language manifest in < 500ms (mocked backend)
T26  Route handler responds in < 200ms with 20 languages in DB
T27  Languages query hits ix_intel_languages_repo index (EXPLAIN check)
Tier 7 — Security (T28–T30)
T28  language field rendered in template: "<script>" is escaped, not executed
T29  sort= query param with SQL injection string → coerced to default, no error
T30  Unauthenticated request to private repo → 403 or redirect

Phase 4 — Route Handler

File: `musehub/api/routes/musehub/ui_intel.py`

Add after `intel_api_surface_page`. Registered at `/{owner}/{repo}/intel/languages`.

Valid sort values: `"name"` (default), `"files"`, `"symbols"`. Valid show values: `"all"` (default), `"code"` (symbol_count > 0 only).

_VALID_LN_SORTS: frozenset[str] = frozenset({"name", "files", "symbols"})
_VALID_LN_SHOW:  frozenset[str] = frozenset({"all", "code"})

@router.get("/{owner}/{repo_slug}/intel/languages")
async def intel_languages_page(
    request: Request,
    owner: SlugParam,
    repo_slug: SlugParam,
    sort: str = "symbols",
    show: str = "all",
    db: AsyncSession = Depends(get_db),
) -> Response:
    """Language composition dashboard — files, symbols, and kind breakdown per language.

    Reads exclusively from ``musehub_intel_languages``; no ``muse code``
    subprocess runs at request time.  Each row represents one language detected
    in the HEAD snapshot, including files with no symbols (assets, config, docs).

    Parameters
    ----------
    sort : str
        Column to sort by: ``"name"`` (alphabetical, default), ``"files"``
        (file_count DESC), or ``"symbols"`` (symbol_count DESC).  Unknown values
        are coerced to ``"symbols"`` so the page never 400s on a bad param.
    show : str
        Filter mode: ``"all"`` includes every language regardless of symbol count;
        ``"code"`` restricts to languages with at least one symbol.  Unknown
        values coerced to ``"all"``.

    Returns
    -------
    Response
        Rendered ``intel_languages.html`` with context keys:
        ``languages``, ``total_languages``, ``total_symbols``, ``total_files``,
        ``code_languages``, ``selected_sort``, ``selected_show``,
        ``valid_sorts``, ``valid_shows``.

    Raises
    ------
    404
        Repository not found or inaccessible.
    """

Context dict must include pre-computed per-row values:

  • `pct_bar` — integer 0–100 for CSS bar width (same as `pct` rounded)
  • `kinds_chips` — list of `{kind, count, label}` dicts for template loops
  • `file_label` — `"file"` or `"files"` (grammatical singular/plural)

Phase 5 — Template `intel_languages.html`

File: `musehub/templates/musehub/pages/intel_languages.html`

Extends: musehub/base.html
Wrapper: <div class="intel-wrap">   ← same as all intel subpages

Key blocks:

  • Subheader: `intel-subhd-title--spectral` · `icon("globe", 16)` in `--color-teal` · "Languages"
  • Stat chips (4): Total Languages · Total Symbols · Total Files · Languages With Code
  • Sort/filter bar: Sort pills (Name / Files / Symbols) + Show toggle (All / Code only)
  • Language table: one expanded row per language with:
    • Language name (large, primary text)
    • File count + symbol count (mono, muted)
    • Spectral gradient bar (proportional to symbol_count / max_symbol_count)
    • Percentage badge
    • Kind chips row: each kind shown as a mini badge with count
  • Empty state: globe icon + "No language data yet — push a commit."

Phase 6 — SCSS

Files:

  • `src/scss/pages/_languages.scss` — layout only (`0 colors, 0 typography`)
  • `src/scss/components/_languages.scss` — visual only

CSS namespace: `.ln-`

// pages/_languages.scss — structural layout
.ln-table         { display: flex; flex-direction: column; gap: 2px; margin-top: 1rem; }
.ln-row           { display: grid; grid-template-columns: 14rem 7rem 7rem 1fr 5rem; ... }
.ln-bar-wrap      { height: 6px; border-radius: 3px; overflow: hidden; }
.ln-bar           { height: 100%; border-radius: 3px; }
.ln-kinds-row     { display: flex; flex-wrap: wrap; gap: 0.35rem; margin-top: 0.35rem; }

// components/_languages.scss — visual
.ln-lang-name     { font-weight: 700; color: var(--text-primary); }
.ln-bar           { background: var(--gradient-spectral); }     // ← spectral gradient
.ln-pct-badge     { font-family: var(--font-mono); color: var(--text-muted); }
.ln-kind-chip     { background: var(--bg-surface); border: 1px solid var(--border-default);
                    border-radius: 4px; font-size: 0.6rem; font-family: var(--font-mono);
                    padding: 1px 6px; color: var(--text-muted); }
.ln-row--no-code  { opacity: 0.45; }   // languages with 0 symbols dimmed

Phase 7 — Dashboard Card + Wire-Up

Files:

  • `musehub/templates/musehub/pages/intel_dashboard.html` — add card after API surface
  • `musehub/api/routes/musehub/ui_intel.py` — add `languages_count` + `languages_preview` to dashboard ctx

Dashboard card design:

┌────────────────────────────────────┐
│  ◈ LANGUAGES          View all →   │
│  21 languages · 12,544 symbols     │
│                                    │
│  Python    ████████████████  86.7% │
│  Markdown  ██░░░░░░░░░░░░░  11.9% │
│  HTML      █░░░░░░░░░░░░░░   1.0% │
└────────────────────────────────────┘

Show top 3 languages by symbol count with proportional mini-bars.


Spectral Theme — What to Use

Element Token
Page accent / icon `var(--color-teal)` + `icon("globe", 16)`
Subheader gradient text `intel-subhd-title--spectral` (CSS class)
Composition bars `var(--gradient-spectral)` — the rainbow gradient
Stat chip values `var(--gradient-spectral)` via `-webkit-background-clip: text`
Zero-symbol rows `opacity: 0.45` — visually de-emphasised, not hidden
Kind chips `var(--bg-surface)` + `var(--border-default)` — neutral, not colored
Dashboard mini-bars `var(--gradient-spectral)`
Percentage badge `var(--font-mono)` + `var(--text-muted)`
Sort pill active `intel-filter-pill--active` (shared class)

Docstring Standard

Every class, method, and function added in this ticket carries NumPy-style docstrings with: one-line summary, `Parameters` section, `Returns` section, `Raises` section where applicable, and a `Notes` section for non-obvious invariants. See `ApiSurfaceProvider` and `intel_api_surface_page` as the canonical templates. No multi-paragraph inline comments — docstrings only.


Files to Create / Modify

File Action
`alembic/versions/XXXX_add_kinds_json_to_intel_languages.py` create
`musehub/db/musehub_models.py` add `kinds_json` to `MusehubIntelLanguages`
`musehub/services/musehub_intel_providers.py` rewrite `LanguagesProvider` — delete `_run_muse` call
`tests/test_intel_languages.py` create (30 tests, TDD-first, all RED)
`musehub/api/routes/musehub/ui_intel.py` add `intel_languages_page` + dashboard ctx
`musehub/templates/musehub/pages/intel_languages.html` create
`musehub/templates/musehub/pages/intel_dashboard.html` add languages card
`src/scss/pages/_languages.scss` create
`src/scss/components/_languages.scss` create
`src/scss/app.scss` add `@use` imports

Acceptance Criteria

  • Phase 1: Migration applies cleanly; `kinds_json` column exists in `musehub_intel_languages`
  • Phase 2: `LanguagesProvider` uses ONLY `language_of` + `parse_symbols` + snapshot manifest — zero `_run_muse` calls, zero subprocess
  • Phase 2: Single batch-upsert statement (all languages ≤ 20 rows typically)
  • Phase 3: All 30 tests written RED before Phase 4 begins; all GREEN after
  • Phase 4: Route returns 200 with data, 404 for unknown repo, coerces bad sort/show params
  • Phase 5: Template renders 4 stat chips, sort bar, language rows with spectral bars and kind chips, empty state
  • Phase 6: SCSS compiles clean, spectral gradient on bars and stat values
  • Phase 7: Dashboard card shows top 3 languages with mini-bars and total count
  • Page live on staging at `/{owner}/{repo}/intel/languages`
  • `muse code languages` subprocess NEVER called in production code path
Activity3
gabriel opened this issue 48 days ago
gabriel 48 days ago

Phase 1 complete ✅

Migration 0014_languages_kinds_json.py applied. kinds_json JSONB column live in musehub_intel_languages. ORM model updated with full docstring. Working on task/intel-languages branch.

Proceeding to Phase 2 — provider rewrite.

gabriel 48 days ago

✅ Phase 3–7 Complete — all 30 tests GREEN, deployed to staging

What shipped

Phase 1 — Migration 0014: kinds_json JSONB column on musehub_intel_languages Phase 2LanguagesProvider rewrite: pure language_of() + parse_symbols() from stored snapshot objects — zero subprocesses, zero on-disk repo required Phase 3 — TDD: 30 tests across 7 tiers (DB model, provider, route, E2E HTML, data integrity, performance, security) — all RED before implementation Phase 4 — Route handler intel_languages_page at /{owner}/{repo_slug}/intel/languages with sort (pct/files/symbols) and top (20/50/100) params Phase 5 — Template intel_languages.html: stat chips, spectral bar per language, kind chips (function/class/method/async_fn/async_method), empty state Phase 6 — SCSS (.ln- namespace): components/_languages.scss (visual) + pages/_languages.scss (grid layout, responsive at 700px) Phase 7 — Dashboard card wired: top-5-by-files preview with spectral bars, link to /intel/languages

Tests

30/30 passed  (8.07s)
T01-T04  DB model: columns, nullable kinds_json, composite PK, cascade delete
T05-T09  Provider: no-subprocess, file counts, kinds_json, pct sum, empty snapshot
T10-T17  Route: 200/empty/404, sort params, top limit, 422 on bad input
T18-T21  E2E HTML: language names, pct bar widths, kind chips, dashboard link
T22-T24  Data integrity: no duplicates, overwrite upsert, cross-repo isolation
T25-T27  Performance: provider <2s/100 files, route <200ms/50 rows, index check
T28-T30  Security: XSS escape, SQL injection in sort param, no-500 on bad path

Live

  • Languages page: https://staging.musehub.ai/gabriel/musehub/intel/languages
  • Dashboard card: https://staging.musehub.ai/gabriel/musehub/intel
  • Deploy tag: 7c3b2a0c-20260503141150
gabriel 13 days ago

Cross-reference: issue #66 covers the full proposal type × strategy × history matrix, including the canonical naming of all 7 ProposalType enum values. Phase 2 of #66 aligns with Phase 3 of this issue (state_merge → merge rename + migration).