# MuseHub Cloud Infrastructure > Last updated: 2026-04-08 --- ## Overview MuseHub runs on AWS EC2 (us-east-1) behind nginx with Let's Encrypt TLS. The application stack is Docker Compose: musehub (uvicorn) + postgres:16 + musehub-runner. No managed RDS, no ECS, no load balancer — intentionally minimal for this stage. Two environments: | Environment | Domain | Instance | Elastic IP | Deploy status | |-------------|--------|----------|------------|---------------| | Production | `musehub.ai` | `i-0855d6efe7fa1a49d` (`musehub-prod`) | `98.89.99.211` | ⚠️ Not yet integrated — no IAM instance profile attached, SSM agent unreachable. `push.sh prod` will fail until the `musehub-ec2-ssm` IAM role is associated with the instance. | | Staging | `staging.musehub.ai` | `i-07547cd20bee2dea5` (`musehub-staging`) | `23.22.27.39` | ✅ Active — blue/green deploys working via `push.sh staging` | --- ## Shared AWS Resources | Resource | Value | |----------|-------| | Region | `us-east-1` | | AMI | `ami-0c7217cdde317cfec` (Ubuntu 22.04 LTS) | | Instance type | `t3.small` | | Security group | `sg-05815872537fcfe76` (`musehub-sg`) | | ECR registry | `992382692655.dkr.ecr.us-east-1.amazonaws.com` | | ECR repository | `musehub/musehub` | | IAM deploy user | `musehub-infra` (ECR push + SSM send) | | IAM instance role | `musehub-ec2-ssm` (ECR pull + SSM receive) | Security group inbound rules: - TCP 443 — HTTPS (Cloudflare IPs only, IPv4 + IPv6) Port 22 (SSH) and port 80 (HTTP) are not open. All remote access is via AWS SSM Session Manager. Cloudflare SSL mode is **Full (Strict)** — Cloudflare terminates TLS at the edge using a Cloudflare-issued cert, then connects to the origin on port 443 using the Cloudflare Origin Certificate at `/etc/ssl/cloudflare/origin.pem`. Nginx never needs to listen on port 80. Instance access requires the `musehub-infra` AWS credentials (default profile in `~/.aws/credentials`). --- ## Production Environment > ⚠️ **Prod deploy not yet active.** The instance has no IAM instance profile — the SSM > agent cannot register, so `push.sh prod` fails with `InvalidInstanceId`. To fix: > associate the `musehub-ec2-ssm` IAM role with `i-0855d6efe7fa1a49d` in the EC2 console > (Actions → Security → Modify IAM Role), then verify with > `aws ssm describe-instance-information --filters Key=InstanceIds,Values=i-0855d6efe7fa1a49d`. ### Instance ``` Instance ID : i-0855d6efe7fa1a49d Name : musehub-prod Elastic IP : 98.89.99.211 App dir : /opt/musehub ``` ### Namecheap DNS (musehub.ai) | Type | Host | Value | TTL | |------|------|-------|-----| | A Record | @ | 98.89.99.211 | Automatic | | A Record | www | 98.89.99.211 | Automatic | ### Stack ``` nginx (host, ports 80/443) └─ proxy_pass → 127.0.0.1:1337 └─ musehub container (uvicorn, port 1337) └─ depends_on → postgres container (port 5432 internal) musehub-runner container (polls musehub API for CI jobs) ``` ### Volumes | Volume | Contents | |--------|----------| | `musehub_data` | Object store — all pushed repo objects | | `postgres_data` | PostgreSQL data directory | | `runner_workspace` | CI job working directories | ### Environment variables (.env on instance at /opt/musehub/.env) ``` DEBUG=false DATABASE_URL=postgresql+asyncpg://musehub:@postgres:5432/musehub DB_PASSWORD= CORS_ORIGINS=["https://musehub.ai", "https://www.musehub.ai"] WEBHOOK_SECRET_KEY= MUSEHUB_ALLOWED_ORIGINS=["musehub.ai", "www.musehub.ai"] RUNNER_TOKEN= ``` ### Nginx config Final SSL config lives at `/etc/nginx/sites-available/musehub` on the instance. Reference copy: `deploy/nginx-ssl.conf`. Key timeouts: - `/push` and `/push/objects` — 300 s (large repo push serialization) - Everything else — 60 s ### SSL Let's Encrypt via Certbot. Auto-renews via cron (`certbot renew`). Certificate lives at `/etc/letsencrypt/live/musehub.ai/`. ### Instance access (SSM — no SSH) ```bash # Open an interactive shell on the prod instance aws ssm start-session --target i-0855d6efe7fa1a49d --region us-east-1 # Run a one-off command aws ssm send-command \ --instance-ids i-0855d6efe7fa1a49d \ --document-name "AWS-RunShellScript" \ --parameters 'commands=["sudo docker ps"]' \ --region us-east-1 \ --query "Command.CommandId" --output text ``` ### Useful commands on the instance Run via SSM (`aws ssm start-session --target --region us-east-1`). The active app slot is either `musehub-blue` (port 1337) or `musehub-green` (port 1338). ```bash # Which slot is live? cat /opt/musehub/.active-slot cat /etc/nginx/musehub-active-port # View running containers sudo docker ps # Tail live app logs (substitute blue/green as needed) sudo docker logs -f musehub-blue sudo docker logs -f musehub-green # Quick health check curl -s http://127.0.0.1:1337/healthz # blue slot curl -s http://127.0.0.1:1338/healthz # green slot # Run Alembic migrations manually (against the live DB) SLOT=$(cat /opt/musehub/.active-slot) DB_PASSWORD=$(grep ^DB_PASSWORD /opt/musehub/.env | cut -d= -f2) sudo docker run --rm \ --network musehub_musehub-internal \ --env-file /opt/musehub/.env \ -e "DATABASE_URL=postgresql+asyncpg://musehub:${DB_PASSWORD}@postgres:5432/musehub" \ : alembic upgrade head # Postgres shell (postgres container started by docker compose for the DB) sudo docker exec -it postgres psql -U musehub -d musehub # View nginx status sudo systemctl status nginx sudo nginx -t ``` --- ## Staging Environment ### Purpose Full production mirror with a separate DB, separate object store, and separate domain. Used for smoke tests before every prod deploy. Never exposed to users. ### Instance (provisioned by aws-provision-staging.sh) ``` Instance ID : i-07547cd20bee2dea5 Name : musehub-staging Elastic IP : 23.22.27.39 App dir : /opt/musehub Domain : staging.musehub.ai ``` ### Namecheap DNS (musehub.ai, Advanced DNS tab) | Type | Host | Value | TTL | |------|------|-------|-----| | A Record | staging | `23.22.27.39` | Automatic | ### Provisioning (one-time, run locally) ```bash # 1. Provision EC2 + EIP chmod +x deploy/aws-provision-staging.sh ./deploy/aws-provision-staging.sh # Note the instance ID and Elastic IP printed at the end. # 2. Add staging.musehub.ai A record on Namecheap (see above). # Wait for propagation (~5 min with Automatic TTL): watch -n 10 "dig staging.musehub.ai +short" # 3. Bootstrap the instance (installs AWS CLI, verifies ECR access) bash deploy/bootstrap-instance.sh staging # 4. Run setup script on the instance via SSM aws ssm send-command \ --instance-ids \ --document-name "AWS-RunShellScript" \ --parameters 'commands=["chmod +x /opt/musehub/deploy/setup-ec2-staging.sh && /opt/musehub/deploy/setup-ec2-staging.sh"]' \ --region us-east-1 # 5. Do the first deploy bash deploy/push.sh staging ``` ### Recovering a down staging instance (522 / Bad Gateway) **Symptom:** `staging.musehub.ai` returns Cloudflare 522 or Bad Gateway. **Root cause pattern:** The container stopped (either manually or after a reboot) and `--restart unless-stopped` did not fire because the container was in a stopped (not crashed) state when the instance last rebooted. **Fix — one SSM command, no polling:** ```bash CMD_ID=$(aws ssm send-command \ --region us-east-1 \ --instance-ids i-07547cd20bee2dea5 \ --document-name "AWS-RunShellScript" \ --parameters '{"commands":["sudo docker start musehub-blue musehub-worker 2>&1 && sudo musehub-set-slot blue && echo done"]}' \ --query "Command.CommandId" --output text) echo "Command sent: $CMD_ID" # Wait ~20s then check once: sleep 20 && aws ssm get-command-invocation \ --region us-east-1 \ --command-id "$CMD_ID" \ --instance-id i-07547cd20bee2dea5 \ --query "[Status,StandardOutputContent]" --output text ``` Check `staging.musehub.ai` in the browser — it should be back. **Critical rules when using SSM to recover staging:** 1. **Never reboot to fix SSM Pending.** A reboot stops containers that were manually started — `--restart unless-stopped` only auto-starts containers that were *running* (not stopped) at reboot time. Rebooting to fix SSM will take the site down and require a manual `docker start` anyway. 2. **Never poll SSM in a loop.** The shell `until`/`while sleep` pattern freezes the terminal and masks whether the command succeeded. Send the command, wait a fixed interval, fetch once. 3. **SSM Pending ≠ SSM broken.** The agent can show `Online` but queue commands as `Pending` for 10–30 seconds after a fresh start. Wait before concluding SSM is broken. 4. **InProgress means it will complete.** If a command shows `InProgress` it is executing on the instance — do not cancel or resend. Check back in 30s. ### Ongoing code deploys to staging ```bash # Standard — builds image locally, pushes to ECR, triggers blue-green on staging bash deploy/push.sh staging ``` ### Publishing a new muse CLI release The `install.sh` script (served at `https://staging.musehub.ai/install.sh`) downloads `muse-{version}.tar.gz` from `/releases/`. The version comes from `musehub/protocol/version.py` (`MUSE_VERSION`), which tracks the musehub package version. To ship a new muse build: ```bash # From ~/ecosystem/musehub — builds sdist, uploads to S3, SSMs to staging, # cleans up old tarballs (keeps 3), and verifies the URL is live. bash deploy/publish_muse_release.sh ``` What it does: 1. Builds `muse-{version}.tar.gz` from `~/ecosystem/muse` 2. Uploads to `s3://musehub-releases/muse-{version}.tar.gz` 3. SSMs to staging to copy from S3 → `/data/releases/` (Docker volume) 4. Deletes stale tarballs from S3 and the server (keeps the 3 newest) 5. Smoke-tests `https://staging.musehub.ai/releases/muse-{version}.tar.gz` **Note:** SSH is blocked on the instance (port 443 only). All server commands go through AWS SSM (`musehub-infra` IAM user). The staging instance (`i-07547cd20bee2dea5`) has the required IAM instance profile; no other instance does. To test the install script end-to-end locally: ```bash curl -fsSL https://staging.musehub.ai/install.sh | sh # verify ~/.local/bin/muse --version # cleanup rm -rf ~/.local/share/muse/venv && rm -f ~/.local/bin/muse ``` ### Instance access (SSM — no SSH) ```bash # Interactive shell on staging aws ssm start-session --target i-07547cd20bee2dea5 --region us-east-1 ``` --- ## Deployment Workflow Deploys are image-based via ECR. No SSH, no rsync, no code on the instance after provisioning. All deploy commands run from the local `~/ecosystem/musehub` directory. ### Deploy pipeline overview ``` Local machine (push.sh): 1. docker build (linux/amd64) 2. docker save → tar, crane push → ECR (musehub/musehub:) 3. aws ssm send-command → sync deploy.sh, then run it Instance (deploy.sh via SSM): 4. deploy.sh written from local copy (always current — never stale) 5. aws ecr get-login-password | docker login 6. docker pull : 7. docker run (migrations only, then exit) 8. docker run -d (new slot — blue or green) 9. curl /healthz until healthy 10. nginx -s reload (zero-downtime flip) 11. docker rm (old slot) ``` **Key invariant:** `push.sh` always writes the current local `deploy.sh` to the instance via SSM before running it. This means the instance's `deploy.sh` is always in sync with the local repo — there is no separate "sync the deploy scripts" step. ### ECR Push — Use crane (not docker push) `docker push` to ECR routes through Docker Desktop's VPNKit proxy (`http.docker.internal:3128` / `192.168.65.1:3128` from inside the VM). After a local IP change or a Docker Desktop restart, the VPNKit proxy drops connections mid-upload on large layer pushes, producing broken pipe errors. The fix is **crane** — Google's container registry tool — which pushes images directly from the macOS host network, bypassing the Docker Desktop VM layer and its proxy entirely. **crane is the standard push method. Never use `docker push` to ECR.** Install once: ```bash brew install crane ``` `push.sh` calls crane internally. If pushing manually outside the script: ```bash # 1. Build the image locally (linux/amd64 target) docker build --platform linux/amd64 -t musehub/musehub:latest . # 2. Save to a tar archive on the host docker save musehub/musehub:latest -o /tmp/musehub-latest.tar # 3. Authenticate crane against ECR aws ecr get-login-password --region us-east-1 \ | crane auth login 992382692655.dkr.ecr.us-east-1.amazonaws.com \ --username AWS --password-stdin # 4. Push with crane (runs entirely on the macOS host — no VPNKit involved) crane push /tmp/musehub-latest.tar \ 992382692655.dkr.ecr.us-east-1.amazonaws.com/musehub/musehub:latest ``` --- ### Standard deploy ```bash # Deploy to staging (always first) bash deploy/push.sh staging # Deploy to prod after staging smoke test bash deploy/push.sh prod # Or both in sequence bash deploy/push.sh staging prod ``` ### Rollback ```bash # List recent ECR image tags aws ecr describe-images \ --repository-name musehub/musehub \ --region us-east-1 \ --query 'sort_by(imageDetails,&imagePushedAt)[-10:].imageTags[0]' \ --output table # Redeploy a specific tag (skips build+push) IMAGE_TAG= bash deploy/push.sh staging IMAGE_TAG= bash deploy/push.sh prod ``` ### Emergency migration rollback (on instance via SSM) ```bash aws ssm send-command \ --instance-ids i-0855d6efe7fa1a49d \ --document-name "AWS-RunShellScript" \ --parameters 'commands=["cd /opt/musehub && sudo docker run --rm --network musehub_musehub-internal --env-file .env : alembic downgrade -1"]' \ --region us-east-1 ``` --- ## Backups No automated backup is configured yet. Planned: - Daily `pg_dump` compressed to S3 (or a second EBS snapshot) - Volume snapshot via AWS before every production deploy - Object store (`musehub_data`) is content-addressed — safe to snapshot at any time Until automated backups are set up, take a manual snapshot before every prod deploy: ```bash # On prod instance sudo docker compose exec postgres pg_dump -U musehub musehub | gzip > ~/musehub-backup-$(date +%Y%m%d).sql.gz ``` --- ## Costs (approximate, us-east-1, 2025 pricing) | Item | $/month | |------|---------| | t3.small (prod) | ~$15 | | t3.small (staging) | ~$15 (stop when not in use to reduce cost) | | Elastic IPs (2) | ~$0 while associated, $3.60/mo each if unassociated | | EBS gp3 20 GB (each) | ~$1.60 | | **Total (both running)** | **~$35/mo** | To pause staging when not needed: ```bash aws ec2 stop-instances --region us-east-1 --instance-ids # Start again with: aws ec2 start-instances --region us-east-1 --instance-ids ``` The Elastic IP stays associated while the instance is stopped — no charge. --- ## Secrets inventory All secrets live in `/opt/musehub/.env` on each instance. Never committed to source. | Secret | How generated | Rotation | |--------|---------------|----------| | `DB_PASSWORD` | `openssl rand -hex 16` | Manual, on compromise | | `WEBHOOK_SECRET_KEY` | Fernet key | Manual, on compromise | | `RUNNER_TOKEN` | `openssl rand -hex 32` | Manual, on compromise | Ed25519 identity keys live in `~/.muse/identity.toml` on each client machine. No server-side secret is involved in MSign auth — the public key in the DB is the credential.