MuseHub — Database Management
Everything an agent needs to work with the database: migrations, backfills, inspection, and the container topology. Read this before touching schema or data.
Container topology
| Container | Role |
|---|---|
musehub |
App server — runs migrations, backfill scripts, alembic |
musehub_postgres |
PostgreSQL — where the data lives |
musehub_worker |
Background worker |
musehub_runner |
Task runner |
The DB is inside musehub_postgres, not on the host. Never try to connect
from the host with psql or alembic directly — they can't reach the socket.
All DB operations go through the musehub app container.
# Check which containers are running
docker ps --format "{{.Names}} {{.Status}}"
Running migrations
Alembic is configured inside the container. Run it with docker exec:
# Apply all pending migrations
docker exec musehub python3 -m alembic upgrade head
# Check current revision
docker exec musehub python3 -m alembic current
# Show pending migrations
docker exec musehub python3 -m alembic history --indicate-current
# Downgrade one step
docker exec musehub python3 -m alembic downgrade -1
# Downgrade to a specific revision
docker exec musehub python3 -m alembic downgrade 0015
Migrations live in alembic/versions/ and are numbered sequentially:
0001_…py, 0002_…py, etc. Name new migrations <next_num>_<description>.py.
Writing a migration
"""Short description of what this migration does.
Revision ID: 0019
Revises: 0018
Create Date: YYYY-MM-DD
"""
from __future__ import annotations
from alembic import op
import sqlalchemy as sa
revision = "0019"
down_revision = "0018"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column(
"musehub_some_table",
sa.Column("new_col", sa.JSON(), nullable=True),
)
def downgrade() -> None:
op.drop_column("musehub_some_table", "new_col")
Always implement downgrade(). Always set down_revision to the previous
migration's revision string. Check with docker exec musehub python3 -m alembic current
before writing down_revision if unsure.
Running backfill scripts
One-off data fixes live in deploy/. They are designed to run inside the container:
# Dry-run first — always
docker exec musehub python3 /app/deploy/backfill_genesis_ops.py --dry-run
docker exec musehub python3 /app/deploy/backfill_content_ids_from_snapshots.py --dry-run
# Then run for real
docker exec musehub python3 /app/deploy/backfill_genesis_ops.py
docker exec musehub python3 /app/deploy/backfill_content_ids_from_snapshots.py
# Scope to a single repo
docker exec musehub python3 /app/deploy/backfill_genesis_ops.py --repo-id <repo_id>
The pattern for every backfill script:
--dry-runcounts rows without writing--repo-idlimits to one repo-qsuppresses progress output- Idempotent — safe to run multiple times
Inspecting the database
# Open a psql session inside the postgres container
docker exec -it musehub_postgres psql -U musehub -d musehub
# Run a one-shot query without an interactive session
docker exec musehub_postgres psql -U musehub -d musehub -c "SELECT COUNT(*) FROM musehub_symbol_history_entries;"
# Useful inspection queries:
# Current alembic revision
docker exec musehub_postgres psql -U musehub -d musehub \
-c "SELECT version_num FROM alembic_version;"
# Row counts for key tables
docker exec musehub_postgres psql -U musehub -d musehub -c "
SELECT
relname AS table,
n_live_tup AS rows
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC
LIMIT 20;
"
# Check a specific symbol's history
docker exec musehub_postgres psql -U musehub -d musehub -c "
SELECT address, op, content_id, committed_at
FROM musehub_symbol_history_entries
WHERE address = 'src/app.ts'
ORDER BY committed_at ASC
LIMIT 10;
"
# Find entries with missing content_id (for backfill diagnosis)
docker exec musehub_postgres psql -U musehub -d musehub -c "
SELECT COUNT(*) FROM musehub_symbol_history_entries
WHERE content_id IS NULL AND address NOT LIKE '%::%';
"
Schema overview
Key tables and what they hold:
| Table | Primary key | Purpose |
|---|---|---|
musehub_repos |
repo_id |
Repo metadata |
musehub_commits |
commit_id |
Push records; commit_meta JSON has structured_delta |
musehub_snapshots |
snapshot_id |
File-tree manifest blobs (msgpack {path: content_id}) |
musehub_objects |
object_id |
Content-addressed binary objects |
musehub_symbol_history_entries |
(repo_id, address, commit_id) |
One row per symbol × commit; op is raw DomainOp type; op_payload is full op dict |
musehub_symbol_intel |
(repo_id, address) |
Per-symbol metrics: churn, gravity, blast |
musehub_hash_occurrence_entries |
(content_id, repo_id, address) |
Clone detection index |
musehub_intel_results |
(repo_id, intel_type) |
Aggregate intel blobs (snapshot, summary) |
musehub_commits.commit_meta JSON shape:
{
"structured_delta": {
"ops": [
{"op": "insert", "address": "src/app.py::Foo", "content_id": "sha256:...", ...}
]
},
"sem_ver_bump": "patch",
"agent_id": "claude-code",
"model_id": "claude-sonnet-4-6"
}
musehub_symbol_history_entries.op stores the raw DomainOp type verbatim:
insert, delete, replace, patch, mutate, move, directory_rename.
Common agent mistakes
Running alembic from the host — fails with ModuleNotFoundError: No module named 'musehub' or DATABASE_URL must be set. Always docker exec musehub python3 -m alembic ....
Running backfill scripts from the host — same problem. Always docker exec musehub python3 /app/deploy/....
Connecting psql to localhost — the DB port is not exposed to the host by default. Use docker exec musehub_postgres psql ....
Wrong container name — the active app container is musehub (not musehub-blue or musehub-green; those are staging/prod slot names). Use docker ps to confirm.