gabriel / musehub public
db-management.md markdown
198 lines 6.0 KB
Raw
sha256:3ff9c9863a9891bdcde71b4a43228f66d0493e38b7cc1d09fe9eb7de774046b2 feat: add repair-commit wire endpoint (API parity with repa… Opus 4.8 minor ⚠ breaking 1 day ago

MuseHub — Database Management

Everything an agent needs to work with the database: migrations, backfills, inspection, and the container topology. Read this before touching schema or data.


Container topology

Container Role
musehub App server — runs migrations, backfill scripts, alembic
musehub_postgres PostgreSQL — where the data lives
musehub_worker Background worker
musehub_runner Task runner

The DB is inside musehub_postgres, not on the host. Never try to connect from the host with psql or alembic directly — they can't reach the socket. All DB operations go through the musehub app container.

# Check which containers are running
docker ps --format "{{.Names}} {{.Status}}"

Running migrations

Alembic is configured inside the container. Run it with docker exec:

# Apply all pending migrations
docker exec musehub python3 -m alembic upgrade head

# Check current revision
docker exec musehub python3 -m alembic current

# Show pending migrations
docker exec musehub python3 -m alembic history --indicate-current

# Downgrade one step
docker exec musehub python3 -m alembic downgrade -1

# Downgrade to a specific revision
docker exec musehub python3 -m alembic downgrade 0015

Migrations live in alembic/versions/ and are numbered sequentially: 0001_…py, 0002_…py, etc. Name new migrations <next_num>_<description>.py.


Writing a migration

"""Short description of what this migration does.

Revision ID: 0019
Revises: 0018
Create Date: YYYY-MM-DD
"""
from __future__ import annotations
from alembic import op
import sqlalchemy as sa

revision = "0019"
down_revision = "0018"
branch_labels = None
depends_on = None

def upgrade() -> None:
    op.add_column(
        "musehub_some_table",
        sa.Column("new_col", sa.JSON(), nullable=True),
    )

def downgrade() -> None:
    op.drop_column("musehub_some_table", "new_col")

Always implement downgrade(). Always set down_revision to the previous migration's revision string. Check with docker exec musehub python3 -m alembic current before writing down_revision if unsure.


Running backfill scripts

One-off data fixes live in deploy/. They are designed to run inside the container:

# Dry-run first — always
docker exec musehub python3 /app/deploy/backfill_genesis_ops.py --dry-run
docker exec musehub python3 /app/deploy/backfill_content_ids_from_snapshots.py --dry-run

# Then run for real
docker exec musehub python3 /app/deploy/backfill_genesis_ops.py
docker exec musehub python3 /app/deploy/backfill_content_ids_from_snapshots.py

# Scope to a single repo
docker exec musehub python3 /app/deploy/backfill_genesis_ops.py --repo-id <repo_id>

The pattern for every backfill script:

  • --dry-run counts rows without writing
  • --repo-id limits to one repo
  • -q suppresses progress output
  • Idempotent — safe to run multiple times

Inspecting the database

# Open a psql session inside the postgres container
docker exec -it musehub_postgres psql -U musehub -d musehub

# Run a one-shot query without an interactive session
docker exec musehub_postgres psql -U musehub -d musehub -c "SELECT COUNT(*) FROM musehub_symbol_history_entries;"

# Useful inspection queries:

# Current alembic revision
docker exec musehub_postgres psql -U musehub -d musehub \
  -c "SELECT version_num FROM alembic_version;"

# Row counts for key tables
docker exec musehub_postgres psql -U musehub -d musehub -c "
SELECT
  relname AS table,
  n_live_tup AS rows
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC
LIMIT 20;
"

# Check a specific symbol's history
docker exec musehub_postgres psql -U musehub -d musehub -c "
SELECT address, op, content_id, committed_at
FROM musehub_symbol_history_entries
WHERE address = 'src/app.ts'
ORDER BY committed_at ASC
LIMIT 10;
"

# Find entries with missing content_id (for backfill diagnosis)
docker exec musehub_postgres psql -U musehub -d musehub -c "
SELECT COUNT(*) FROM musehub_symbol_history_entries
WHERE content_id IS NULL AND address NOT LIKE '%::%';
"

Schema overview

Key tables and what they hold:

Table Primary key Purpose
musehub_repos repo_id Repo metadata
musehub_commits commit_id Push records; commit_meta JSON has structured_delta
musehub_snapshots snapshot_id File-tree manifest blobs (msgpack {path: content_id})
musehub_objects object_id Content-addressed binary objects
musehub_symbol_history_entries (repo_id, address, commit_id) One row per symbol × commit; op is raw DomainOp type; op_payload is full op dict
musehub_symbol_intel (repo_id, address) Per-symbol metrics: churn, gravity, blast
musehub_hash_occurrence_entries (content_id, repo_id, address) Clone detection index
musehub_intel_results (repo_id, intel_type) Aggregate intel blobs (snapshot, summary)

musehub_commits.commit_meta JSON shape:

{
  "structured_delta": {
    "ops": [
      {"op": "insert", "address": "src/app.py::Foo", "content_id": "sha256:...", ...}
    ]
  },
  "sem_ver_bump": "patch",
  "agent_id": "claude-code",
  "model_id": "claude-sonnet-4-6"
}

musehub_symbol_history_entries.op stores the raw DomainOp type verbatim: insert, delete, replace, patch, mutate, move, directory_rename.


Common agent mistakes

Running alembic from the host — fails with ModuleNotFoundError: No module named 'musehub' or DATABASE_URL must be set. Always docker exec musehub python3 -m alembic ....

Running backfill scripts from the host — same problem. Always docker exec musehub python3 /app/deploy/....

Connecting psql to localhost — the DB port is not exposed to the host by default. Use docker exec musehub_postgres psql ....

Wrong container name — the active app container is musehub (not musehub-blue or musehub-green; those are staging/prod slot names). Use docker ps to confirm.

File History 1 commit
sha256:3ff9c9863a9891bdcde71b4a43228f66d0493e38b7cc1d09fe9eb7de774046b2 feat: add repair-commit wire endpoint (API parity with repa… Opus 4.8 minor 1 day ago