# MuseHub — Database Management Everything an agent needs to work with the database: migrations, backfills, inspection, and the container topology. Read this before touching schema or data. --- ## Container topology | Container | Role | |-----------|------| | `musehub` | App server — runs migrations, backfill scripts, alembic | | `musehub_postgres` | PostgreSQL — where the data lives | | `musehub_worker` | Background worker | | `musehub_runner` | Task runner | **The DB is inside `musehub_postgres`, not on the host.** Never try to connect from the host with `psql` or `alembic` directly — they can't reach the socket. All DB operations go through the `musehub` app container. ```bash # Check which containers are running docker ps --format "{{.Names}} {{.Status}}" ``` --- ## Running migrations Alembic is configured inside the container. Run it with `docker exec`: ```bash # Apply all pending migrations docker exec musehub python3 -m alembic upgrade head # Check current revision docker exec musehub python3 -m alembic current # Show pending migrations docker exec musehub python3 -m alembic history --indicate-current # Downgrade one step docker exec musehub python3 -m alembic downgrade -1 # Downgrade to a specific revision docker exec musehub python3 -m alembic downgrade 0015 ``` Migrations live in `alembic/versions/` and are numbered sequentially: `0001_…py`, `0002_…py`, etc. Name new migrations `_.py`. --- ## Writing a migration ```python """Short description of what this migration does. Revision ID: 0019 Revises: 0018 Create Date: YYYY-MM-DD """ from __future__ import annotations from alembic import op import sqlalchemy as sa revision = "0019" down_revision = "0018" branch_labels = None depends_on = None def upgrade() -> None: op.add_column( "musehub_some_table", sa.Column("new_col", sa.JSON(), nullable=True), ) def downgrade() -> None: op.drop_column("musehub_some_table", "new_col") ``` Always implement `downgrade()`. Always set `down_revision` to the previous migration's `revision` string. Check with `docker exec musehub python3 -m alembic current` before writing `down_revision` if unsure. --- ## Running backfill scripts One-off data fixes live in `deploy/`. They are designed to run inside the container: ```bash # Dry-run first — always docker exec musehub python3 /app/deploy/backfill_genesis_ops.py --dry-run docker exec musehub python3 /app/deploy/backfill_content_ids_from_snapshots.py --dry-run # Then run for real docker exec musehub python3 /app/deploy/backfill_genesis_ops.py docker exec musehub python3 /app/deploy/backfill_content_ids_from_snapshots.py # Scope to a single repo docker exec musehub python3 /app/deploy/backfill_genesis_ops.py --repo-id ``` The pattern for every backfill script: - `--dry-run` counts rows without writing - `--repo-id` limits to one repo - `-q` suppresses progress output - Idempotent — safe to run multiple times --- ## Inspecting the database ```bash # Open a psql session inside the postgres container docker exec -it musehub_postgres psql -U musehub -d musehub # Run a one-shot query without an interactive session docker exec musehub_postgres psql -U musehub -d musehub -c "SELECT COUNT(*) FROM musehub_symbol_history_entries;" # Useful inspection queries: # Current alembic revision docker exec musehub_postgres psql -U musehub -d musehub \ -c "SELECT version_num FROM alembic_version;" # Row counts for key tables docker exec musehub_postgres psql -U musehub -d musehub -c " SELECT relname AS table, n_live_tup AS rows FROM pg_stat_user_tables ORDER BY n_live_tup DESC LIMIT 20; " # Check a specific symbol's history docker exec musehub_postgres psql -U musehub -d musehub -c " SELECT address, op, content_id, committed_at FROM musehub_symbol_history_entries WHERE address = 'src/app.ts' ORDER BY committed_at ASC LIMIT 10; " # Find entries with missing content_id (for backfill diagnosis) docker exec musehub_postgres psql -U musehub -d musehub -c " SELECT COUNT(*) FROM musehub_symbol_history_entries WHERE content_id IS NULL AND address NOT LIKE '%::%'; " ``` --- ## Schema overview Key tables and what they hold: | Table | Primary key | Purpose | |-------|-------------|---------| | `musehub_repos` | `repo_id` | Repo metadata | | `musehub_commits` | `commit_id` | Push records; `commit_meta` JSON has `structured_delta` | | `musehub_snapshots` | `snapshot_id` | File-tree manifest blobs (msgpack `{path: content_id}`) | | `musehub_objects` | `object_id` | Content-addressed binary objects | | `musehub_symbol_history_entries` | `(repo_id, address, commit_id)` | One row per symbol × commit; `op` is raw DomainOp type; `op_payload` is full op dict | | `musehub_symbol_intel` | `(repo_id, address)` | Per-symbol metrics: churn, gravity, blast | | `musehub_hash_occurrence_entries` | `(content_id, repo_id, address)` | Clone detection index | | `musehub_intel_results` | `(repo_id, intel_type)` | Aggregate intel blobs (snapshot, summary) | `musehub_commits.commit_meta` JSON shape: ```json { "structured_delta": { "ops": [ {"op": "insert", "address": "src/app.py::Foo", "content_id": "sha256:...", ...} ] }, "sem_ver_bump": "patch", "agent_id": "claude-code", "model_id": "claude-sonnet-4-6" } ``` `musehub_symbol_history_entries.op` stores the raw DomainOp type verbatim: `insert`, `delete`, `replace`, `patch`, `mutate`, `move`, `directory_rename`. --- ## Common agent mistakes **Running alembic from the host** — fails with `ModuleNotFoundError: No module named 'musehub'` or `DATABASE_URL must be set`. Always `docker exec musehub python3 -m alembic ...`. **Running backfill scripts from the host** — same problem. Always `docker exec musehub python3 /app/deploy/...`. **Connecting psql to localhost** — the DB port is not exposed to the host by default. Use `docker exec musehub_postgres psql ...`. **Wrong container name** — the active app container is `musehub` (not `musehub-blue` or `musehub-green`; those are staging/prod slot names). Use `docker ps` to confirm.