feat: intel/languages GUI page — language breakdown web dashboard
Summary
Port `muse code languages` into the Intel Hub as a fully interactive web page. No subprocess at any point — background worker on every `muse push` aggregates language stats from stored snapshot objects using pure-Python helpers; a SQL query drives the UI.
The DB model (`MusehubIntelLanguages`) and provider skeleton (`LanguagesProvider`) already exist but both have critical architectural violations that must be fixed before the GUI can be built.
The Architecture — Read This First
Every intel provider in this codebase follows one pattern:
muse push received
│
▼
worker: fetch snapshot manifest from musehub_snapshots.manifest_blob (msgpack)
│
▼
worker: for each file path → object_id in manifest:
src = await get_backend(owner, slug).get(object_id) ← bytes from object store
language = language_of(file_path) ← pure Python, zero I/O
tree = parse_symbols(src, file_path) ← pure AST, zero subprocess
│
▼
worker: aggregate counts → batch-upsert to musehub_intel_languages
│
▼
HTTP request: SELECT FROM musehub_intel_languages WHERE repo_id = $1
│
▼
template rendered — zero muse subprocesses, ever
`language_of` is a pure Python function — `muse.plugins.code._query.language_of` — that maps file extension to display language name via a dict lookup. `parse_symbols` is a pure AST parser — `muse.plugins.code.ast_parser.parse_symbols`. Neither touches disk, forks a process, or needs a repo checkout.
This is identical to TypeProvider and ApiSurfaceProvider. Copy that pattern exactly.
Current State
| Layer | Status |
|---|---|
| DB table `musehub_intel_languages` | ✅ exists — missing `kinds_json` JSONB column |
| ORM model `MusehubIntelLanguages` | ✅ exists — missing `kinds_json` field |
| `LanguagesProvider.compute()` | ❌ uses `_run_muse` subprocess + row-by-row upserts |
| Route `intel_languages_page` | ❌ missing |
| Template `intel_languages.html` | ❌ missing |
| SCSS `pages/_languages.scss` + `components/_languages.scss` | ❌ missing |
| Dashboard card on `intel_dashboard.html` | ❌ missing |
| Test suite `tests/test_intel_languages.py` | ❌ missing |
Data Shape — `muse code languages --json` output
{
"languages": [
{
"language": "Python",
"file_count": 451,
"symbol_count": 10869,
"kinds": {
"variable": 1092,
"function": 1346,
"async_function": 2305,
"class": 1447,
"method": 2214,
"async_method": 2465
}
},
{
"language": "Markdown",
"file_count": 29,
"symbol_count": 1488,
"kinds": { "section": 1348, "variable": 140 }
},
{
"language": "HTML",
"file_count": 106,
"symbol_count": 129,
"kinds": { "section": 129 }
},
{
"language": "TOML",
"file_count": 1,
"symbol_count": 52,
"kinds": { "section": 17, "variable": 35 }
}
]
}
The `kinds` dict is the richest field — it breaks down symbol composition inside each language. It is NOT currently stored in the DB; Phase 1 adds it.
Page Design
┌───────────────────────────────────────────────────────────────────────────────────┐
│ ← Intel Hub ◈ Languages │
│ Code composition by programming language — files, symbols, and kind breakdown. │
├───────────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ 21 │ │ 12,544 │ │ 588 │ │ 5 │ │
│ │ Languages │ │ Symbols │ │ Files │ │ With Code │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ Sort: [ Name ▼ ] [ Files ] [ Symbols ] │
│ │
│ ┌──────────────────────────────────────────────────────────────────────────────┐ │
│ │ LANGUAGE FILES SYMBOLS COMPOSITION │ │
│ ├──────────────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ Python 451 10,869 ████████████████████████░ 86.7% │ │
│ │ ╰─ fn: 1,346 fn~: 2,305 class: 1,447 method: 2,214 method~: 2,465 │ │
│ │ var: 1,092 │ │
│ │ │ │
│ │ Markdown 29 1,488 ███░░░░░░░░░░░░░░░░░░░░░░ 11.9% │ │
│ │ ╰─ section: 1,348 var: 140 │ │
│ │ │ │
│ │ HTML 106 129 █░░░░░░░░░░░░░░░░░░░░░░░░ 1.0% │ │
│ │ ╰─ section: 129 │ │
│ │ │ │
│ │ TOML 1 52 ░░░░░░░░░░░░░░░░░░░░░░░░░ 0.4% │ │
│ │ ╰─ section: 17 var: 35 │ │
│ │ │ │
│ │ (no ext) 6 0 ░░░░░░░░░░░░░░░░░░░░░░░░░ 0.0% │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ Sort: name ▼ | files | symbols │
└───────────────────────────────────────────────────────────────────────────────────┘
Show all languages (including those with 0 symbols) — they tell you about file types present in the repo (assets, config, docs). Filter by "With code only" to focus on symbolic languages.
Implementation Plan — Load-Bearing Order
Phase 1 — DB schema: add `kinds_json` column
File: Alembic migration (next version after current head) File: `musehub/db/musehub_models.py`
Add `kinds_json` JSONB column to `musehub_intel_languages`:
# musehub_models.py addition to MusehubIntelLanguages:
kinds_json: Mapped[dict | None] = mapped_column(
JSON, nullable=True, default=None,
comment="kind → count breakdown, e.g. {'function': 1346, 'class': 1447}",
)
Migration stub:
# alembic/versions/XXXX_add_kinds_json_to_intel_languages.py
def upgrade():
op.add_column(
"musehub_intel_languages",
sa.Column("kinds_json", postgresql.JSONB(astext_type=sa.Text()), nullable=True),
)
def downgrade():
op.drop_column("musehub_intel_languages", "kinds_json")
Why load-bearing first: provider (Phase 2) writes this column; all tests assume it exists.
Phase 2 — Provider rewrite: `LanguagesProvider`
File: `musehub/services/musehub_intel_providers.py`
Replace the subprocess + row-by-row pattern entirely. The new provider:
- Reads the snapshot manifest from `musehub_snapshots.manifest_blob`
- Calls `language_of(file_path)` — a pure dict lookup, no I/O
- Fetches file bytes via `get_backend(owner, slug).get(object_id)`
- Calls `parse_symbols(src, file_path)` for each file — pure AST
- Aggregates `file_count`, `symbol_count`, `kinds` per language
- Computes `pct` = symbol_count / total_symbols * 100 (0.0 if no symbols)
- Batch-upserts all rows in a single chunked call
from muse.plugins.code._query import language_of
from muse.plugins.code.ast_parser import parse_symbols
from musehub.storage.backends import get_backend
class LanguagesProvider:
"""Persist language composition derived from stored snapshot objects.
Reads the HEAD snapshot manifest from ``musehub_snapshots.manifest_blob``
(msgpack dict of path → object_id), calls ``language_of()`` (pure extension
map — no subprocess, no I/O) to group files by language, fetches each file's
bytes from the storage backend, and runs ``parse_symbols`` (pure AST) to
count symbols and kind breakdowns per language. Aggregated results are
batch-upserted into ``musehub_intel_languages`` in a single SQL statement
(language count is always small — typically < 20 rows).
This follows the same DB-only pattern as ``TypeProvider`` and
``ApiSurfaceProvider``: all data flows from objects stored at push time;
nothing requires a working muse repository on disk. The subprocess-based
``_run_muse`` call is explicitly prohibited.
Parameters
----------
session : AsyncSession
Active async SQLAlchemy session provided by the push worker.
repo_id : str
Opaque repo identifier from ``musehub_repos.repo_id``.
ref : str
Branch or commit ref string at push time (e.g. ``"dev"``).
payload : JSONObject
Raw push-event JSON — used to resolve owner/slug for the storage
backend when not already present in the session.
Returns
-------
IntelResults
``[("intel.code.languages", {"count": N})]`` on success where N is
the number of distinct languages found. ``[]`` when the snapshot
manifest cannot be resolved or the repo has no tracked files.
Notes
-----
Import pseudo-symbols (``kind == "import"``) are excluded from counts to
match ``muse code languages`` default behaviour. \`pct\` is computed over
total symbol count across all languages (0.0 when no symbols exist).
"""
_CHUNK = 1_000
_IMPORT_KINDS: frozenset[str] = frozenset({"import"})
async def compute(
self, session: AsyncSession, repo_id: str, ref: str, payload: JSONObject
) -> IntelResults:
import msgpack
# 1. Resolve owner/slug
# 2. Fetch HEAD snapshot manifest (same boilerplate as TypeProvider)
# 3. Walk manifest: language_of(path) → file counts (pure dict lookup)
# 4. Fetch bytes + parse_symbols → kind counts per language
# 5. Compute pct
# 6. Batch-upsert in one shot (< 20 rows normally)
Critical imports to add at module level (for patchability in tests):
from muse.plugins.code._query import language_of as _language_of
from muse.plugins.code.ast_parser import parse_symbols as _parse_symbols
Phase 3 — TDD: `tests/test_intel_languages.py` (all 30 tests RED first)
Write ALL tests before touching the route or template. Tests must fail (404 / import errors / assertion errors) before Phase 4 exists.
Pattern: mirror `tests/test_intel_api_surface.py` and `tests/test_intel_type.py`.
Tier 1 — Unit (T01–T04)
T01 MusehubIntelLanguages row insert + read-back (repo_id + language PK)
T02 kinds_json field stores and retrieves dict correctly
T03 on_conflict_do_update: upsert overwrites symbol_count/file_count/kinds_json/ref
T04 cascade delete: dropping repo purges all language rows
Tier 2 — Integration / Provider (T05–T09)
T05 LanguagesProvider.compute() reads snapshot manifest — no _run_muse call ever
T06 3-language snapshot: correct file_count and symbol_count per language
T07 kinds_json populated correctly: Python entry has function/class/method keys
T08 pct computed correctly: sums to ~100.0 across all languages with symbols
T09 Empty manifest returns [] without touching musehub_intel_languages
Tier 3 — Route (T10–T17)
T10 GET /{owner}/{repo}/intel/languages → 200 with repo that has data
T11 GET /{owner}/{repo}/intel/languages → empty state for repo with no data (no 500)
T12 Repo not found → 404
T13 sort=symbols: rows ordered by symbol_count DESC
T14 sort=files: rows ordered by file_count DESC
T15 sort=name: rows ordered alphabetically
T16 show=all: languages with 0 symbols included
T17 show=code: only languages with symbol_count > 0 returned
Tier 4 — E2E HTML (T18–T21)
T18 Stat chip "Languages" shows correct total count
T19 Stat chip "Symbols" shows sum of all symbol counts
T20 Bar width attribute reflects pct value from DB
T21 kinds_json chips rendered per language row
Tier 5 — Data Integrity (T22–T24)
T22 Two sequential upserts produce exactly N rows (no duplicates per language)
T23 After language renamed across pushes: old name row updated, not duplicated
T24 Cross-repo isolation: repo A's language rows never visible from repo B
Tier 6 — Performance (T25–T27)
T25 Provider processes 50-language manifest in < 500ms (mocked backend)
T26 Route handler responds in < 200ms with 20 languages in DB
T27 Languages query hits ix_intel_languages_repo index (EXPLAIN check)
Tier 7 — Security (T28–T30)
T28 language field rendered in template: "<script>" is escaped, not executed
T29 sort= query param with SQL injection string → coerced to default, no error
T30 Unauthenticated request to private repo → 403 or redirect
Phase 4 — Route Handler
File: `musehub/api/routes/musehub/ui_intel.py`
Add after `intel_api_surface_page`. Registered at `/{owner}/{repo}/intel/languages`.
Valid sort values: `"name"` (default), `"files"`, `"symbols"`. Valid show values: `"all"` (default), `"code"` (symbol_count > 0 only).
_VALID_LN_SORTS: frozenset[str] = frozenset({"name", "files", "symbols"})
_VALID_LN_SHOW: frozenset[str] = frozenset({"all", "code"})
@router.get("/{owner}/{repo_slug}/intel/languages")
async def intel_languages_page(
request: Request,
owner: SlugParam,
repo_slug: SlugParam,
sort: str = "symbols",
show: str = "all",
db: AsyncSession = Depends(get_db),
) -> Response:
"""Language composition dashboard — files, symbols, and kind breakdown per language.
Reads exclusively from ``musehub_intel_languages``; no ``muse code``
subprocess runs at request time. Each row represents one language detected
in the HEAD snapshot, including files with no symbols (assets, config, docs).
Parameters
----------
sort : str
Column to sort by: ``"name"`` (alphabetical, default), ``"files"``
(file_count DESC), or ``"symbols"`` (symbol_count DESC). Unknown values
are coerced to ``"symbols"`` so the page never 400s on a bad param.
show : str
Filter mode: ``"all"`` includes every language regardless of symbol count;
``"code"`` restricts to languages with at least one symbol. Unknown
values coerced to ``"all"``.
Returns
-------
Response
Rendered ``intel_languages.html`` with context keys:
``languages``, ``total_languages``, ``total_symbols``, ``total_files``,
``code_languages``, ``selected_sort``, ``selected_show``,
``valid_sorts``, ``valid_shows``.
Raises
------
404
Repository not found or inaccessible.
"""
Context dict must include pre-computed per-row values:
- `pct_bar` — integer 0–100 for CSS bar width (same as `pct` rounded)
- `kinds_chips` — list of `{kind, count, label}` dicts for template loops
- `file_label` — `"file"` or `"files"` (grammatical singular/plural)
Phase 5 — Template `intel_languages.html`
File: `musehub/templates/musehub/pages/intel_languages.html`
Extends: musehub/base.html
Wrapper: <div class="intel-wrap"> ← same as all intel subpages
Key blocks:
- Subheader: `intel-subhd-title--spectral` · `icon("globe", 16)` in `--color-teal` · "Languages"
- Stat chips (4): Total Languages · Total Symbols · Total Files · Languages With Code
- Sort/filter bar: Sort pills (Name / Files / Symbols) + Show toggle (All / Code only)
- Language table: one expanded row per language with:
- Language name (large, primary text)
- File count + symbol count (mono, muted)
- Spectral gradient bar (proportional to symbol_count / max_symbol_count)
- Percentage badge
- Kind chips row: each kind shown as a mini badge with count
- Empty state: globe icon + "No language data yet — push a commit."
Phase 6 — SCSS
Files:
- `src/scss/pages/_languages.scss` — layout only (`0 colors, 0 typography`)
- `src/scss/components/_languages.scss` — visual only
CSS namespace: `.ln-`
// pages/_languages.scss — structural layout
.ln-table { display: flex; flex-direction: column; gap: 2px; margin-top: 1rem; }
.ln-row { display: grid; grid-template-columns: 14rem 7rem 7rem 1fr 5rem; ... }
.ln-bar-wrap { height: 6px; border-radius: 3px; overflow: hidden; }
.ln-bar { height: 100%; border-radius: 3px; }
.ln-kinds-row { display: flex; flex-wrap: wrap; gap: 0.35rem; margin-top: 0.35rem; }
// components/_languages.scss — visual
.ln-lang-name { font-weight: 700; color: var(--text-primary); }
.ln-bar { background: var(--gradient-spectral); } // ← spectral gradient
.ln-pct-badge { font-family: var(--font-mono); color: var(--text-muted); }
.ln-kind-chip { background: var(--bg-surface); border: 1px solid var(--border-default);
border-radius: 4px; font-size: 0.6rem; font-family: var(--font-mono);
padding: 1px 6px; color: var(--text-muted); }
.ln-row--no-code { opacity: 0.45; } // languages with 0 symbols dimmed
Phase 7 — Dashboard Card + Wire-Up
Files:
- `musehub/templates/musehub/pages/intel_dashboard.html` — add card after API surface
- `musehub/api/routes/musehub/ui_intel.py` — add `languages_count` + `languages_preview` to dashboard ctx
Dashboard card design:
┌────────────────────────────────────┐
│ ◈ LANGUAGES View all → │
│ 21 languages · 12,544 symbols │
│ │
│ Python ████████████████ 86.7% │
│ Markdown ██░░░░░░░░░░░░░ 11.9% │
│ HTML █░░░░░░░░░░░░░░ 1.0% │
└────────────────────────────────────┘
Show top 3 languages by symbol count with proportional mini-bars.
Spectral Theme — What to Use
| Element | Token |
|---|---|
| Page accent / icon | `var(--color-teal)` + `icon("globe", 16)` |
| Subheader gradient text | `intel-subhd-title--spectral` (CSS class) |
| Composition bars | `var(--gradient-spectral)` — the rainbow gradient |
| Stat chip values | `var(--gradient-spectral)` via `-webkit-background-clip: text` |
| Zero-symbol rows | `opacity: 0.45` — visually de-emphasised, not hidden |
| Kind chips | `var(--bg-surface)` + `var(--border-default)` — neutral, not colored |
| Dashboard mini-bars | `var(--gradient-spectral)` |
| Percentage badge | `var(--font-mono)` + `var(--text-muted)` |
| Sort pill active | `intel-filter-pill--active` (shared class) |
Docstring Standard
Every class, method, and function added in this ticket carries NumPy-style docstrings with: one-line summary, `Parameters` section, `Returns` section, `Raises` section where applicable, and a `Notes` section for non-obvious invariants. See `ApiSurfaceProvider` and `intel_api_surface_page` as the canonical templates. No multi-paragraph inline comments — docstrings only.
Files to Create / Modify
| File | Action |
|---|---|
| `alembic/versions/XXXX_add_kinds_json_to_intel_languages.py` | create |
| `musehub/db/musehub_models.py` | add `kinds_json` to `MusehubIntelLanguages` |
| `musehub/services/musehub_intel_providers.py` | rewrite `LanguagesProvider` — delete `_run_muse` call |
| `tests/test_intel_languages.py` | create (30 tests, TDD-first, all RED) |
| `musehub/api/routes/musehub/ui_intel.py` | add `intel_languages_page` + dashboard ctx |
| `musehub/templates/musehub/pages/intel_languages.html` | create |
| `musehub/templates/musehub/pages/intel_dashboard.html` | add languages card |
| `src/scss/pages/_languages.scss` | create |
| `src/scss/components/_languages.scss` | create |
| `src/scss/app.scss` | add `@use` imports |
Acceptance Criteria
- Phase 1: Migration applies cleanly; `kinds_json` column exists in `musehub_intel_languages`
- Phase 2: `LanguagesProvider` uses ONLY `language_of` + `parse_symbols` + snapshot manifest — zero `_run_muse` calls, zero subprocess
- Phase 2: Single batch-upsert statement (all languages ≤ 20 rows typically)
- Phase 3: All 30 tests written RED before Phase 4 begins; all GREEN after
- Phase 4: Route returns 200 with data, 404 for unknown repo, coerces bad sort/show params
- Phase 5: Template renders 4 stat chips, sort bar, language rows with spectral bars and kind chips, empty state
- Phase 6: SCSS compiles clean, spectral gradient on bars and stat values
- Phase 7: Dashboard card shows top 3 languages with mini-bars and total count
- Page live on staging at `/{owner}/{repo}/intel/languages`
- `muse code languages` subprocess NEVER called in production code path
✅ Phase 3–7 Complete — all 30 tests GREEN, deployed to staging
What shipped
Phase 1 — Migration 0014: kinds_json JSONB column on musehub_intel_languages
Phase 2 — LanguagesProvider rewrite: pure language_of() + parse_symbols() from stored snapshot objects — zero subprocesses, zero on-disk repo required
Phase 3 — TDD: 30 tests across 7 tiers (DB model, provider, route, E2E HTML, data integrity, performance, security) — all RED before implementation
Phase 4 — Route handler intel_languages_page at /{owner}/{repo_slug}/intel/languages with sort (pct/files/symbols) and top (20/50/100) params
Phase 5 — Template intel_languages.html: stat chips, spectral bar per language, kind chips (function/class/method/async_fn/async_method), empty state
Phase 6 — SCSS (.ln- namespace): components/_languages.scss (visual) + pages/_languages.scss (grid layout, responsive at 700px)
Phase 7 — Dashboard card wired: top-5-by-files preview with spectral bars, link to /intel/languages
Tests
30/30 passed (8.07s)
T01-T04 DB model: columns, nullable kinds_json, composite PK, cascade delete
T05-T09 Provider: no-subprocess, file counts, kinds_json, pct sum, empty snapshot
T10-T17 Route: 200/empty/404, sort params, top limit, 422 on bad input
T18-T21 E2E HTML: language names, pct bar widths, kind chips, dashboard link
T22-T24 Data integrity: no duplicates, overwrite upsert, cross-repo isolation
T25-T27 Performance: provider <2s/100 files, route <200ms/50 rows, index check
T28-T30 Security: XSS escape, SQL injection in sort param, no-500 on bad path
Live
- Languages page: https://staging.musehub.ai/gabriel/musehub/intel/languages
- Dashboard card: https://staging.musehub.ai/gabriel/musehub/intel
- Deploy tag:
7c3b2a0c-20260503141150
Cross-reference: issue #66 covers the full proposal type × strategy × history matrix, including the canonical naming of all 7 ProposalType enum values. Phase 2 of #66 aligns with Phase 3 of this issue (state_merge → merge rename + migration).
Phase 1 complete ✅
Migration
0014_languages_kinds_json.pyapplied.kinds_json JSONBcolumn live inmusehub_intel_languages. ORM model updated with full docstring. Working ontask/intel-languagesbranch.Proceeding to Phase 2 — provider rewrite.