Open #53 data-model

filed by gabriel human · 34 days ago

Data model lockdown: full audit, consolidation, and zero-drift enforcement

0 Anchors

— Blast radius

— Churn 30d

0 Proposals

Why this exists

MuseHub has been built at high velocity over two months. The data model has accumulated significant sprawl: 70 ORM tables across 7 files, a 2236-line monolith model file, 55 migrations including 7 repair migrations (0028, 0035, 0036, 0037, 0038, 0039, 0040), and live schema drift that is currently blocking staging deployment. This issue closes that debt permanently.

The goal: after this issue is closed, alembic check produces zero meaningful differences, assert_schema_matches_orm passes every deploy, and we have a documented policy that prevents future drift from accumulating silently.

Scope of known problems (pre-audit)

Live drift blocking staging deploy right now

musehub_issue_labels — table exists in staging DB, dropped from ORM in migration 0026, still present (migration did not run cleanly on staging)
musehub_auth_challenges.challenge_id — column removed from ORM, no migration written
uq_musehub_auth_challenges_nonce_hex — constraint removed from ORM, no migration written
muse_commits.commit_metadata — ORM says JSONB, DB has JSON, no migration written
muse_snapshots.manifest — ORM says JSONB, DB has JSON, no migration written
30+ server_default modify_default items — likely benign noise from Alembic's comparison of equivalent expressions, but need verification

Structural debt

musehub_models.py is 2236 lines and contains 56 tables — a maintenance hazard
models.py exists but appears empty — dead file
7 repair migrations in a 55-migration history — symptom of recurring schema instability
No index audit has ever been done — high-traffic query paths may be missing indexes
assert_schema_matches_orm only checks table/column presence and nullability — does not catch type drift or index drift

Phases

Phase 1 — Fix the live drift, unblock staging deploy

TDD first: write a test that runs alembic check against a fresh local DB and asserts zero meaningful differences (excluding known-benign server_default noise). The test must fail before the fix and pass after.

Then:

Audit each item in the known drift list above against the actual staging DB
Write migration 0056 that resolves all real differences in one shot
Verify alembic check output is clean against both local and staging DB
Redeploy staging — health check must pass

Gate: alembic check exits 0 (or our filtered equivalent exits 0) against a fresh DB after alembic upgrade head.

Phase 2 — Full table audit: remove the dead, consolidate the sprawl

For every table in the ORM (all 70):

Is it queried anywhere in the application? (muse code impact)
Is it written anywhere? If neither — candidate for deletion
Does it have data on staging that matters? If so, migrate it out first
Does it duplicate another table's responsibility?

Known candidates to investigate:

musehub_snapshot_entries — snapshots now use manifest_blob (msgpack), entries table may be orphaned
musehub_hash_occurrence_entries — intel subsystem, check if actively populated
musehub_intel_codemap_modules / musehub_intel_codemap_meta — codemap feature status?
musehub_wire_tags — what is this? Check usage
musehub_bridge_mirrors — git bridge feature, is it live?
musehub_mpay_claims — MPay feature status?
musehub_attestations / musehub_attestation_claim_types — attestation feature status?
musehub_profile_snapshots — is this actively populated or orphaned?
musehub_sessions — vs musehub_auth_challenges / musehub_auth_keys — are all three needed?

Gate: every surviving table has at least one active read path in the application code. Deleted tables have a DROP migration. No orphaned ORM classes.

Phase 3 — Index audit: every hot query path has the right index

For every table that survives Phase 2, audit:

All WHERE clause columns in hot queries (push, fetch, wire, issue list, proposal list)
All JOIN columns
All ORDER BY columns used in pagination
Foreign keys that lack a corresponding index (PostgreSQL does not auto-index FKs)

Known missing/suspect indexes to check:

musehub_object_refs(repo_id) — added in 0052, verify coverage
musehub_commits(repo_id, branch) — push and fetch walk this constantly
musehub_snapshots(repo_id) — same
musehub_issues(repo_id, state) — issue list query
musehub_proposals(repo_id, state) — proposal list query
musehub_symbol_intel(repo_id, commit_id) — intel indexing hot path
All created_at columns used in time-range queries

Gate: EXPLAIN ANALYZE on the top-10 hot queries shows index scans, not seq scans, on tables with >1000 rows.

Phase 4 — Consolidate musehub_models.py

The 2236-line monolith makes the model hard to navigate and review. Split by domain:

Proposed split:

musehub_repo_models.py — repo, branch, commit, snapshot, object, object_ref, snapshot_entries
musehub_social_models.py — issue, issue_comment, issue_event, proposal, proposal_review, proposal_comment, proposal_dependency, proposal_simulation, proposal_labels
musehub_release_models.py — release, release_asset
musehub_webhook_models.py — webhook, webhook_delivery
musehub_intel_models.py — all musehub_intel_* and musehub_symbol_* tables
musehub_identity_models.py — identity (moved from musehub_auth_models.py or kept separate)
musehub_abuse_models.py — blocked_hashes, daily_push_bytes, push_anomalies, stream_rejections
musehub_jobs_models.py — background_job
Keep existing: musehub_auth_models.py, musehub_collaborator_models.py, musehub_domain_models.py, musehub_label_models.py, coord_models.py, muse_cli_models.py

No behavior changes — pure file reorganization. All imports updated. Monolith deleted.

Gate: all existing tests pass. No import of musehub_models.py anywhere in the codebase.

Phase 5 — Harden assert_schema_matches_orm + deploy gate

Current assert_schema_matches_orm only checks table/column presence and nullability. Extend it:

Type checking for critical columns (JSONB vs JSON, VARCHAR length, BOOLEAN vs INTEGER)
Index existence check for declared Index(...) objects in the ORM
Foreign key existence check

Also harden the deploy gate:

Replace the alembic check warning-only step in deploy.sh with a hard fail using our custom filter (exclude known-benign server_default noise, fail on everything else)
Add a pre-deploy test that runs alembic upgrade head + alembic check against a fresh local DB as part of CI

Gate: assert_schema_matches_orm catches JSON→JSONB drift, missing indexes, and FK mismatches. deploy.sh hard-fails on any meaningful schema difference before the new container ever starts.

Definition of done

alembic check (filtered) exits 0 on staging after alembic upgrade head
assert_schema_matches_orm passes on staging startup
No repair migrations added after this issue closes (new drift → new migration, same-day)
Every ORM table has a documented read path
Every hot query path has an index
musehub_models.py deleted — replaced by domain-split files
Deploy hard-fails on schema drift before container start
Zero manual alembic stamp interventions required on staging ever again

◎ Activity4

●

gabriel opened this issue 34 days ago

○

gabriel 34 days ago

Phase 1 complete

What was done

Audited staging DB state directly via SSM: alembic is at revision 0019 — 36 migrations behind head, not partially-applied drift. The schema divergence flagged by alembic check on staging is entirely explained by the migration gap. A fresh local DB with all 55 migrations applied is clean.

Tests landed on dev

S3 (test_s3_alembic_check_exits_clean) — calls alembic.command.check() on a fresh migration-applied DB, the exact gate deploy.sh uses. Passes now, guards against future ORM↔migration drift.
Stale constants fixed — _EXPECTED_HEAD and _EXPECTED_MIGRATION_COUNT updated from 0051/51 to 49036ecac3d6/55. These two tests were failing before this change.

All 9 schema tests green: S1, S2, S3 + 6 structural alembic tests.

Migration 0056 status

Not written. The original plan assumed partially-applied migrations that needed idempotent repair. The actual situation is simpler: staging is cleanly behind and will be nuked and rebuilt fresh. A repair migration would be a no-op on every correctly-deployed instance.

Staging deploy plan

pg_dump already captured to Desktop (153 MB, complete backup)
After all phases complete locally: nuke staging DB, alembic upgrade head on empty schema, deploy, re-import issues/proposals from backup

Phase 1 gate: PASSED — alembic check exits 0 against a fresh DB after alembic upgrade head.

○

gabriel 34 days ago

Phase 2 complete

Audit scope: all 69 ORM tables checked for active read/write paths using muse code impact + muse content-grep.

Dead tables found and dropped (2)

Table	ORM class	Reason
`muse_objects`	`MuseCliObject`	CLI blob cache, never implemented — roadmap E-01 confirms "all currently zero"
`muse_tags`	`MuseCliTag`	Music-semantic commit tags, superseded by `musehub_repos.tags` JSON. Docstring in `ui.py` was stale; actual query never touched the table

Migration 0056: drops both tables (idempotent downgrade restores them).

Dead file deleted: musehub/db/models.py — contained only a tombstone comment, imported nowhere meaningful.

Other cleanup: dead re-exports removed from musehub/muse_cli/models.py, stale docstring in ui.py corrected, all from musehub.db import models imports removed from env.py, database.py, and test_schema_check.py.

Notable non-candidates (agent flagged, verified active):

musehub_attestations / musehub_mpay_claims — use raw text() SQL, invisible to muse code impact, confirmed active via content-grep
musehub_snapshot_entries / musehub_hash_occurrence_entries — same issue with inline imports; both actively written and read
musehub_sessions — 87 callers, fully active (MCP, session service)

Phase 2 gate: PASSED — every surviving table (67) has at least one active read path. Migration 0056 drops the 2 dead tables.

○

gabriel 34 days ago

Phase 3 complete

Scope: 69 surviving tables audited for index coverage against hot query patterns in push, fetch, wire, issue, proposal, and symbol indexer paths.

Finding: one real gap.

Table	Gap	Query	Fix
`musehub_branches`	Only single-column `repo_id` index	`WHERE repo_id = ? AND name = ?` (point lookup on every push/fetch) and `ORDER BY name` (list branches)	Composite `(repo_id, name)` index

Migration 0057: creates ix_musehub_branches_repo_name (repo_id, name), drops the now-redundant single-column ix_musehub_branches_repo_id (covered by the leftmost prefix of the composite).

Everything else confirmed covered:

musehub_commits: composite (repo_id, branch) and (repo_id, timestamp) — push/fetch HEAD lookups covered
musehub_object_refs: composite PK (repo_id, object_id) + ix_musehub_object_refs_object_id — bitmap IN-list scans covered (added 0052)
musehub_issues / musehub_proposals: composite (repo_id, state), (repo_id, number) indexes present
musehub_symbol_history_entries: composite (repo_id, address) and (repo_id, address, committed_at) present
All FK columns: every FK has index=True or is part of a composite PK — no unindexed FK gaps

Phase 3 gate: PASSED — all confirmed hot query paths covered by indexes. No seq scans on branch name lookups after migration 0057.

○

gabriel 34 days ago

Phase 4 — musehub_models.py split ✅

Completed. The 2240-line monolith has been split into 8 domain-specific files:

File	Domain	Classes
`musehub_repo_models.py`	Repos, VCS objects	11
`musehub_social_models.py`	Issues, proposals, forks	9
`musehub_release_models.py`	Releases, assets	2
`musehub_webhook_models.py`	Webhooks, deliveries	2
`musehub_intel_models.py`	All intel_* and symbol_* tables	23
`musehub_identity_models.py`	Identities, attestations, mpay, profiles	5
`musehub_abuse_models.py`	Content safety, rate limits	4
`musehub_jobs_models.py`	Background jobs	1

musehub_models.py is now a 95-line pure re-export shim — all 56 classes remain importable from the original namespace so the 314 existing call sites continue to work unchanged during incremental migration.

Gate: 79 model/schema/DB tests pass, 18 alembic/schema-check tests pass.

Assignee

gabriel human

Release

no commits linked to this issue

create

muse hub issue create \
  --title "..." \
  --body "..." \
  --label bug \
  --anchor path/to/file.py::Symbol \
  --commit-anchor <sha> \
  --repo gabriel/musehub

read

muse hub issue get 53 --json
muse hub issue list --state open --json

update

muse hub issue edit 53 \
  --anchor path/to/file.py::Symbol \
  --repo gabriel/musehub

comment

muse hub issue comment 53 \
  --body "Fixed in <sha>" \
  --repo gabriel/musehub

muse hub issue close 53 \
  --repo gabriel/musehub

create

create_issue({
  repo_id: "sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75",
  title: "...",
  body: "...",
  labels: ["bug"],
  symbol_anchors: [
    "path/to/file.py::Symbol"
  ],
  commit_anchors: ["<sha>"]
})

read

get_issue({
  repo_id: "sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75",
  issue_number: 53
})

list_issues({
  repo_id: "sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75",
  state: "open"
})

update

edit_issue({
  repo_id: "sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75",
  issue_number: 53,
  symbol_anchors: ["path/to/file.py::Symbol"]
})

comment

create_issue_comment({
  repo_id: "sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75",
  issue_number: 53,
  body: "..."
})

close_issue({
  repo_id: "sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75",
  issue_number: 53
})

create

curl -X POST \
  http://localhost:10003/api/repos/sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75/issues \
  -H "Content-Type: application/json" \
  -H "Authorization: MSign handle=\"...\" ts=... sig=\"...\"" \
  -d '{
    "title": "...",
    "body": "...",
    "labels": ["bug"],
    "symbol_anchors": ["path/to/file.py::Symbol"]
  }'

read

# get one issue
curl http://localhost:10003/api/repos/sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75/issues/53

# list open issues
curl "http://localhost:10003/api/repos/sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75/issues?state=open"

update

curl -X PATCH \
  http://localhost:10003/api/repos/sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75/issues/53 \
  -H "Content-Type: application/json" \
  -H "Authorization: MSign handle=\"...\" ts=... sig=\"...\"" \
  -d '{"title": "...", "body": "..."}'

comment

curl -X POST \
  http://localhost:10003/api/repos/sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75/issues/53/comments \
  -H "Content-Type: application/json" \
  -H "Authorization: MSign handle=\"...\" ts=... sig=\"...\"" \
  -d '{"body": "Fixed in <sha>"}'

curl -X POST \
  http://localhost:10003/api/repos/sha256:a265796360c3b1b8700b5682ced5f6b044a2c0d3a2c58918892a5aa494db6c75/issues/53/close \
  -H "Authorization: MSign handle=\"...\" ts=... sig=\"...\""