gabriel / musehub public
Open #53 data-model
filed by gabriel human · 34 days ago

Data model lockdown: full audit, consolidation, and zero-drift enforcement

0 Anchors
Blast radius
Churn 30d
0 Proposals

Why this exists

MuseHub has been built at high velocity over two months. The data model has accumulated significant sprawl: 70 ORM tables across 7 files, a 2236-line monolith model file, 55 migrations including 7 repair migrations (0028, 0035, 0036, 0037, 0038, 0039, 0040), and live schema drift that is currently blocking staging deployment. This issue closes that debt permanently.

The goal: after this issue is closed, alembic check produces zero meaningful differences, assert_schema_matches_orm passes every deploy, and we have a documented policy that prevents future drift from accumulating silently.


Scope of known problems (pre-audit)

Live drift blocking staging deploy right now

  • musehub_issue_labels — table exists in staging DB, dropped from ORM in migration 0026, still present (migration did not run cleanly on staging)
  • musehub_auth_challenges.challenge_id — column removed from ORM, no migration written
  • uq_musehub_auth_challenges_nonce_hex — constraint removed from ORM, no migration written
  • muse_commits.commit_metadata — ORM says JSONB, DB has JSON, no migration written
  • muse_snapshots.manifest — ORM says JSONB, DB has JSON, no migration written
  • 30+ server_default modify_default items — likely benign noise from Alembic's comparison of equivalent expressions, but need verification

Structural debt

  • musehub_models.py is 2236 lines and contains 56 tables — a maintenance hazard
  • models.py exists but appears empty — dead file
  • 7 repair migrations in a 55-migration history — symptom of recurring schema instability
  • No index audit has ever been done — high-traffic query paths may be missing indexes
  • assert_schema_matches_orm only checks table/column presence and nullability — does not catch type drift or index drift

Phases

Phase 1 — Fix the live drift, unblock staging deploy

TDD first: write a test that runs alembic check against a fresh local DB and asserts zero meaningful differences (excluding known-benign server_default noise). The test must fail before the fix and pass after.

Then:

  1. Audit each item in the known drift list above against the actual staging DB
  2. Write migration 0056 that resolves all real differences in one shot
  3. Verify alembic check output is clean against both local and staging DB
  4. Redeploy staging — health check must pass

Gate: alembic check exits 0 (or our filtered equivalent exits 0) against a fresh DB after alembic upgrade head.


Phase 2 — Full table audit: remove the dead, consolidate the sprawl

For every table in the ORM (all 70):

  • Is it queried anywhere in the application? (muse code impact)
  • Is it written anywhere? If neither — candidate for deletion
  • Does it have data on staging that matters? If so, migrate it out first
  • Does it duplicate another table's responsibility?

Known candidates to investigate:

  • musehub_snapshot_entries — snapshots now use manifest_blob (msgpack), entries table may be orphaned
  • musehub_hash_occurrence_entries — intel subsystem, check if actively populated
  • musehub_intel_codemap_modules / musehub_intel_codemap_meta — codemap feature status?
  • musehub_wire_tags — what is this? Check usage
  • musehub_bridge_mirrors — git bridge feature, is it live?
  • musehub_mpay_claims — MPay feature status?
  • musehub_attestations / musehub_attestation_claim_types — attestation feature status?
  • musehub_profile_snapshots — is this actively populated or orphaned?
  • musehub_sessions — vs musehub_auth_challenges / musehub_auth_keys — are all three needed?

Gate: every surviving table has at least one active read path in the application code. Deleted tables have a DROP migration. No orphaned ORM classes.


Phase 3 — Index audit: every hot query path has the right index

For every table that survives Phase 2, audit:

  • All WHERE clause columns in hot queries (push, fetch, wire, issue list, proposal list)
  • All JOIN columns
  • All ORDER BY columns used in pagination
  • Foreign keys that lack a corresponding index (PostgreSQL does not auto-index FKs)

Known missing/suspect indexes to check:

  • musehub_object_refs(repo_id) — added in 0052, verify coverage
  • musehub_commits(repo_id, branch) — push and fetch walk this constantly
  • musehub_snapshots(repo_id) — same
  • musehub_issues(repo_id, state) — issue list query
  • musehub_proposals(repo_id, state) — proposal list query
  • musehub_symbol_intel(repo_id, commit_id) — intel indexing hot path
  • All created_at columns used in time-range queries

Gate: EXPLAIN ANALYZE on the top-10 hot queries shows index scans, not seq scans, on tables with >1000 rows.


Phase 4 — Consolidate musehub_models.py

The 2236-line monolith makes the model hard to navigate and review. Split by domain:

Proposed split:

  • musehub_repo_models.py — repo, branch, commit, snapshot, object, object_ref, snapshot_entries
  • musehub_social_models.py — issue, issue_comment, issue_event, proposal, proposal_review, proposal_comment, proposal_dependency, proposal_simulation, proposal_labels
  • musehub_release_models.py — release, release_asset
  • musehub_webhook_models.py — webhook, webhook_delivery
  • musehub_intel_models.py — all musehub_intel_* and musehub_symbol_* tables
  • musehub_identity_models.py — identity (moved from musehub_auth_models.py or kept separate)
  • musehub_abuse_models.py — blocked_hashes, daily_push_bytes, push_anomalies, stream_rejections
  • musehub_jobs_models.py — background_job
  • Keep existing: musehub_auth_models.py, musehub_collaborator_models.py, musehub_domain_models.py, musehub_label_models.py, coord_models.py, muse_cli_models.py

No behavior changes — pure file reorganization. All imports updated. Monolith deleted.

Gate: all existing tests pass. No import of musehub_models.py anywhere in the codebase.


Phase 5 — Harden assert_schema_matches_orm + deploy gate

Current assert_schema_matches_orm only checks table/column presence and nullability. Extend it:

  • Type checking for critical columns (JSONB vs JSON, VARCHAR length, BOOLEAN vs INTEGER)
  • Index existence check for declared Index(...) objects in the ORM
  • Foreign key existence check

Also harden the deploy gate:

  • Replace the alembic check warning-only step in deploy.sh with a hard fail using our custom filter (exclude known-benign server_default noise, fail on everything else)
  • Add a pre-deploy test that runs alembic upgrade head + alembic check against a fresh local DB as part of CI

Gate: assert_schema_matches_orm catches JSON→JSONB drift, missing indexes, and FK mismatches. deploy.sh hard-fails on any meaningful schema difference before the new container ever starts.


Definition of done

  • alembic check (filtered) exits 0 on staging after alembic upgrade head
  • assert_schema_matches_orm passes on staging startup
  • No repair migrations added after this issue closes (new drift → new migration, same-day)
  • Every ORM table has a documented read path
  • Every hot query path has an index
  • musehub_models.py deleted — replaced by domain-split files
  • Deploy hard-fails on schema drift before container start
  • Zero manual alembic stamp interventions required on staging ever again
Activity4
gabriel opened this issue 34 days ago
gabriel 34 days ago

Phase 1 complete

What was done

Audited staging DB state directly via SSM: alembic is at revision 0019 — 36 migrations behind head, not partially-applied drift. The schema divergence flagged by alembic check on staging is entirely explained by the migration gap. A fresh local DB with all 55 migrations applied is clean.

Tests landed on dev

  • S3 (test_s3_alembic_check_exits_clean) — calls alembic.command.check() on a fresh migration-applied DB, the exact gate deploy.sh uses. Passes now, guards against future ORM↔migration drift.
  • Stale constants fixed_EXPECTED_HEAD and _EXPECTED_MIGRATION_COUNT updated from 0051/51 to 49036ecac3d6/55. These two tests were failing before this change.

All 9 schema tests green: S1, S2, S3 + 6 structural alembic tests.

Migration 0056 status

Not written. The original plan assumed partially-applied migrations that needed idempotent repair. The actual situation is simpler: staging is cleanly behind and will be nuked and rebuilt fresh. A repair migration would be a no-op on every correctly-deployed instance.

Staging deploy plan

  1. pg_dump already captured to Desktop (153 MB, complete backup)
  2. After all phases complete locally: nuke staging DB, alembic upgrade head on empty schema, deploy, re-import issues/proposals from backup

Phase 1 gate: PASSEDalembic check exits 0 against a fresh DB after alembic upgrade head.

gabriel 34 days ago

Phase 2 complete

Audit scope: all 69 ORM tables checked for active read/write paths using muse code impact + muse content-grep.

Dead tables found and dropped (2)

Table ORM class Reason
muse_objects MuseCliObject CLI blob cache, never implemented — roadmap E-01 confirms "all currently zero"
muse_tags MuseCliTag Music-semantic commit tags, superseded by musehub_repos.tags JSON. Docstring in ui.py was stale; actual query never touched the table

Migration 0056: drops both tables (idempotent downgrade restores them).

Dead file deleted: musehub/db/models.py — contained only a tombstone comment, imported nowhere meaningful.

Other cleanup: dead re-exports removed from musehub/muse_cli/models.py, stale docstring in ui.py corrected, all from musehub.db import models imports removed from env.py, database.py, and test_schema_check.py.

Notable non-candidates (agent flagged, verified active):

  • musehub_attestations / musehub_mpay_claims — use raw text() SQL, invisible to muse code impact, confirmed active via content-grep
  • musehub_snapshot_entries / musehub_hash_occurrence_entries — same issue with inline imports; both actively written and read
  • musehub_sessions — 87 callers, fully active (MCP, session service)

Phase 2 gate: PASSED — every surviving table (67) has at least one active read path. Migration 0056 drops the 2 dead tables.

gabriel 34 days ago

Phase 3 complete

Scope: 69 surviving tables audited for index coverage against hot query patterns in push, fetch, wire, issue, proposal, and symbol indexer paths.

Finding: one real gap.

Table Gap Query Fix
musehub_branches Only single-column repo_id index WHERE repo_id = ? AND name = ? (point lookup on every push/fetch) and ORDER BY name (list branches) Composite (repo_id, name) index

Migration 0057: creates ix_musehub_branches_repo_name (repo_id, name), drops the now-redundant single-column ix_musehub_branches_repo_id (covered by the leftmost prefix of the composite).

Everything else confirmed covered:

  • musehub_commits: composite (repo_id, branch) and (repo_id, timestamp) — push/fetch HEAD lookups covered
  • musehub_object_refs: composite PK (repo_id, object_id) + ix_musehub_object_refs_object_id — bitmap IN-list scans covered (added 0052)
  • musehub_issues / musehub_proposals: composite (repo_id, state), (repo_id, number) indexes present
  • musehub_symbol_history_entries: composite (repo_id, address) and (repo_id, address, committed_at) present
  • All FK columns: every FK has index=True or is part of a composite PK — no unindexed FK gaps

Phase 3 gate: PASSED — all confirmed hot query paths covered by indexes. No seq scans on branch name lookups after migration 0057.

gabriel 34 days ago

Phase 4 — musehub_models.py split ✅

Completed. The 2240-line monolith has been split into 8 domain-specific files:

File Domain Classes
musehub_repo_models.py Repos, VCS objects 11
musehub_social_models.py Issues, proposals, forks 9
musehub_release_models.py Releases, assets 2
musehub_webhook_models.py Webhooks, deliveries 2
musehub_intel_models.py All intel_* and symbol_* tables 23
musehub_identity_models.py Identities, attestations, mpay, profiles 5
musehub_abuse_models.py Content safety, rate limits 4
musehub_jobs_models.py Background jobs 1

musehub_models.py is now a 95-line pure re-export shim — all 56 classes remain importable from the original namespace so the 314 existing call sites continue to work unchanged during incremental migration.

Gate: 79 model/schema/DB tests pass, 18 alembic/schema-check tests pass.