MuseHub Cloud Infrastructure
Last updated: 2026-04-08
Overview
MuseHub runs on AWS EC2 (us-east-1) behind nginx with Let's Encrypt TLS. The application stack is Docker Compose: musehub (uvicorn) + postgres:16 + musehub-runner. No managed RDS, no ECS, no load balancer — intentionally minimal for this stage.
Two environments:
| Environment | Domain | Instance | Elastic IP | Deploy status |
|---|---|---|---|---|
| Production | musehub.ai |
i-0855d6efe7fa1a49d (musehub-prod) |
98.89.99.211 |
⚠️ Not yet integrated — no IAM instance profile attached, SSM agent unreachable. push.sh prod will fail until the musehub-ec2-ssm IAM role is associated with the instance. |
| Staging | staging.musehub.ai |
i-07547cd20bee2dea5 (musehub-staging) |
23.22.27.39 |
✅ Active — blue/green deploys working via push.sh staging |
Shared AWS Resources
| Resource | Value |
|---|---|
| Region | us-east-1 |
| AMI | ami-0c7217cdde317cfec (Ubuntu 22.04 LTS) |
| Instance type | t3.small |
| Security group | sg-05815872537fcfe76 (musehub-sg) |
| ECR registry | 992382692655.dkr.ecr.us-east-1.amazonaws.com |
| ECR repository | musehub/musehub |
| IAM deploy user | musehub-infra (ECR push + SSM send) |
| IAM instance role | musehub-ec2-ssm (ECR pull + SSM receive) |
Security group inbound rules:
- TCP 443 — HTTPS (Cloudflare IPs only, IPv4 + IPv6)
Port 22 (SSH) and port 80 (HTTP) are not open. All remote access is via AWS SSM Session Manager.
Cloudflare SSL mode is Full (Strict) — Cloudflare terminates TLS at the edge using a
Cloudflare-issued cert, then connects to the origin on port 443 using the Cloudflare Origin
Certificate at /etc/ssl/cloudflare/origin.pem. Nginx never needs to listen on port 80.
Instance access requires the musehub-infra AWS credentials (default profile in ~/.aws/credentials).
Production Environment
⚠️ Prod deploy not yet active. The instance has no IAM instance profile — the SSM agent cannot register, so
push.sh prodfails withInvalidInstanceId. To fix: associate themusehub-ec2-ssmIAM role withi-0855d6efe7fa1a49din the EC2 console (Actions → Security → Modify IAM Role), then verify withaws ssm describe-instance-information --filters Key=InstanceIds,Values=i-0855d6efe7fa1a49d.
Instance
Instance ID : i-0855d6efe7fa1a49d
Name : musehub-prod
Elastic IP : 98.89.99.211
App dir : /opt/musehub
Namecheap DNS (musehub.ai)
| Type | Host | Value | TTL |
|---|---|---|---|
| A Record | @ | 98.89.99.211 | Automatic |
| A Record | www | 98.89.99.211 | Automatic |
Stack
nginx (host, ports 80/443)
└─ proxy_pass → 127.0.0.1:1337
└─ musehub container (uvicorn, port 1337)
└─ depends_on → postgres container (port 5432 internal)
musehub-runner container (polls musehub API for CI jobs)
Volumes
| Volume | Contents |
|---|---|
musehub_data |
Object store — all pushed repo objects |
postgres_data |
PostgreSQL data directory |
runner_workspace |
CI job working directories |
Environment variables (.env on instance at /opt/musehub/.env)
DEBUG=false
DATABASE_URL=postgresql+asyncpg://musehub:<DB_PASSWORD>@postgres:5432/musehub
DB_PASSWORD=<generated at provision time>
CORS_ORIGINS=["https://musehub.ai", "https://www.musehub.ai"]
WEBHOOK_SECRET_KEY=<generated Fernet key>
MUSEHUB_ALLOWED_ORIGINS=["musehub.ai", "www.musehub.ai"]
RUNNER_TOKEN=<generated at provision time>
Nginx config
Final SSL config lives at /etc/nginx/sites-available/musehub on the instance.
Reference copy: deploy/nginx-ssl.conf.
Key timeouts:
/pushand/push/objects— 300 s (large repo push serialization)- Everything else — 60 s
SSL
Let's Encrypt via Certbot. Auto-renews via cron (certbot renew).
Certificate lives at /etc/letsencrypt/live/musehub.ai/.
Instance access (SSM — no SSH)
# Open an interactive shell on the prod instance
aws ssm start-session --target i-0855d6efe7fa1a49d --region us-east-1
# Run a one-off command
aws ssm send-command \
--instance-ids i-0855d6efe7fa1a49d \
--document-name "AWS-RunShellScript" \
--parameters 'commands=["sudo docker ps"]' \
--region us-east-1 \
--query "Command.CommandId" --output text
Useful commands on the instance
Run via SSM (aws ssm start-session --target <instance-id> --region us-east-1).
The active app slot is either musehub-blue (port 1337) or musehub-green (port 1338).
# Which slot is live?
cat /opt/musehub/.active-slot
cat /etc/nginx/musehub-active-port
# View running containers
sudo docker ps
# Tail live app logs (substitute blue/green as needed)
sudo docker logs -f musehub-blue
sudo docker logs -f musehub-green
# Quick health check
curl -s http://127.0.0.1:1337/healthz # blue slot
curl -s http://127.0.0.1:1338/healthz # green slot
# Run Alembic migrations manually (against the live DB)
SLOT=$(cat /opt/musehub/.active-slot)
DB_PASSWORD=$(grep ^DB_PASSWORD /opt/musehub/.env | cut -d= -f2)
sudo docker run --rm \
--network musehub_musehub-internal \
--env-file /opt/musehub/.env \
-e "DATABASE_URL=postgresql+asyncpg://musehub:${DB_PASSWORD}@postgres:5432/musehub" \
<ecr-image>:<tag> alembic upgrade head
# Postgres shell (postgres container started by docker compose for the DB)
sudo docker exec -it postgres psql -U musehub -d musehub
# View nginx status
sudo systemctl status nginx
sudo nginx -t
Staging Environment
Purpose
Full production mirror with a separate DB, separate object store, and separate domain. Used for smoke tests before every prod deploy. Never exposed to users.
Instance (provisioned by aws-provision-staging.sh)
Instance ID : i-07547cd20bee2dea5
Name : musehub-staging
Elastic IP : 23.22.27.39
App dir : /opt/musehub
Domain : staging.musehub.ai
Namecheap DNS (musehub.ai, Advanced DNS tab)
| Type | Host | Value | TTL |
|---|---|---|---|
| A Record | staging | 23.22.27.39 |
Automatic |
Provisioning (one-time, run locally)
# 1. Provision EC2 + EIP
chmod +x deploy/aws-provision-staging.sh
./deploy/aws-provision-staging.sh
# Note the instance ID and Elastic IP printed at the end.
# 2. Add staging.musehub.ai A record on Namecheap (see above).
# Wait for propagation (~5 min with Automatic TTL):
watch -n 10 "dig staging.musehub.ai +short"
# 3. Bootstrap the instance (installs AWS CLI, verifies ECR access)
bash deploy/bootstrap-instance.sh staging
# 4. Run setup script on the instance via SSM
aws ssm send-command \
--instance-ids <instance-id> \
--document-name "AWS-RunShellScript" \
--parameters 'commands=["chmod +x /opt/musehub/deploy/setup-ec2-staging.sh && /opt/musehub/deploy/setup-ec2-staging.sh"]' \
--region us-east-1
# 5. Do the first deploy
bash deploy/push.sh staging
Recovering a down staging instance (522 / Bad Gateway)
Symptom: staging.musehub.ai returns Cloudflare 522 or Bad Gateway.
Root cause pattern: The container stopped (either manually or after a reboot)
and --restart unless-stopped did not fire because the container was in a
stopped (not crashed) state when the instance last rebooted.
Fix — one SSM command, no polling:
CMD_ID=$(aws ssm send-command \
--region us-east-1 \
--instance-ids i-07547cd20bee2dea5 \
--document-name "AWS-RunShellScript" \
--parameters '{"commands":["sudo docker start musehub-blue musehub-worker 2>&1 && sudo musehub-set-slot blue && echo done"]}' \
--query "Command.CommandId" --output text)
echo "Command sent: $CMD_ID"
# Wait ~20s then check once:
sleep 20 && aws ssm get-command-invocation \
--region us-east-1 \
--command-id "$CMD_ID" \
--instance-id i-07547cd20bee2dea5 \
--query "[Status,StandardOutputContent]" --output text
Check staging.musehub.ai in the browser — it should be back.
Critical rules when using SSM to recover staging:
Never reboot to fix SSM Pending. A reboot stops containers that were manually started —
--restart unless-stoppedonly auto-starts containers that were running (not stopped) at reboot time. Rebooting to fix SSM will take the site down and require a manualdocker startanyway.Never poll SSM in a loop. The shell
until/while sleeppattern freezes the terminal and masks whether the command succeeded. Send the command, wait a fixed interval, fetch once.SSM Pending ≠ SSM broken. The agent can show
Onlinebut queue commands asPendingfor 10–30 seconds after a fresh start. Wait before concluding SSM is broken.InProgress means it will complete. If a command shows
InProgressit is executing on the instance — do not cancel or resend. Check back in 30s.
Ongoing code deploys to staging
# Standard — builds image locally, pushes to ECR, triggers blue-green on staging
bash deploy/push.sh staging
Publishing a new muse CLI release
The install.sh script (served at https://staging.musehub.ai/install.sh) downloads
muse-{version}.tar.gz from /releases/. The version comes from musehub/protocol/version.py
(MUSE_VERSION), which tracks the musehub package version.
To ship a new muse build:
# From ~/ecosystem/musehub — builds sdist, uploads to S3, SSMs to staging,
# cleans up old tarballs (keeps 3), and verifies the URL is live.
bash deploy/publish_muse_release.sh
What it does:
- Builds
muse-{version}.tar.gzfrom~/ecosystem/muse - Uploads to
s3://musehub-releases/muse-{version}.tar.gz - SSMs to staging to copy from S3 →
/data/releases/(Docker volume) - Deletes stale tarballs from S3 and the server (keeps the 3 newest)
- Smoke-tests
https://staging.musehub.ai/releases/muse-{version}.tar.gz
Note: SSH is blocked on the instance (port 443 only). All server commands go through
AWS SSM (musehub-infra IAM user). The staging instance (i-07547cd20bee2dea5) has the
required IAM instance profile; no other instance does.
To test the install script end-to-end locally:
curl -fsSL https://staging.musehub.ai/install.sh | sh
# verify
~/.local/bin/muse --version
# cleanup
rm -rf ~/.local/share/muse/venv && rm -f ~/.local/bin/muse
Instance access (SSM — no SSH)
# Interactive shell on staging
aws ssm start-session --target i-07547cd20bee2dea5 --region us-east-1
Deployment Workflow
Deploys are image-based via ECR. No SSH, no rsync, no code on the instance after provisioning.
All deploy commands run from the local ~/ecosystem/musehub directory.
Deploy pipeline overview
Local machine (push.sh):
1. docker build (linux/amd64)
2. docker save → tar, crane push → ECR (musehub/musehub:<tag>)
3. aws ssm send-command → sync deploy.sh, then run it
Instance (deploy.sh via SSM):
4. deploy.sh written from local copy (always current — never stale)
5. aws ecr get-login-password | docker login
6. docker pull <ecr>:<tag>
7. docker run (migrations only, then exit)
8. docker run -d (new slot — blue or green)
9. curl /healthz until healthy
10. nginx -s reload (zero-downtime flip)
11. docker rm (old slot)
Key invariant: push.sh always writes the current local deploy.sh to the instance
via SSM before running it. This means the instance's deploy.sh is always in sync with
the local repo — there is no separate "sync the deploy scripts" step.
ECR Push — Use crane (not docker push)
docker push to ECR routes through Docker Desktop's VPNKit proxy
(http.docker.internal:3128 / 192.168.65.1:3128 from inside the VM). After a local IP
change or a Docker Desktop restart, the VPNKit proxy drops connections mid-upload on large
layer pushes, producing broken pipe errors. The fix is crane — Google's container
registry tool — which pushes images directly from the macOS host network, bypassing the
Docker Desktop VM layer and its proxy entirely.
crane is the standard push method. Never use docker push to ECR.
Install once:
brew install crane
push.sh calls crane internally. If pushing manually outside the script:
# 1. Build the image locally (linux/amd64 target)
docker build --platform linux/amd64 -t musehub/musehub:latest .
# 2. Save to a tar archive on the host
docker save musehub/musehub:latest -o /tmp/musehub-latest.tar
# 3. Authenticate crane against ECR
aws ecr get-login-password --region us-east-1 \
| crane auth login 992382692655.dkr.ecr.us-east-1.amazonaws.com \
--username AWS --password-stdin
# 4. Push with crane (runs entirely on the macOS host — no VPNKit involved)
crane push /tmp/musehub-latest.tar \
992382692655.dkr.ecr.us-east-1.amazonaws.com/musehub/musehub:latest
Standard deploy
# Deploy to staging (always first)
bash deploy/push.sh staging
# Deploy to prod after staging smoke test
bash deploy/push.sh prod
# Or both in sequence
bash deploy/push.sh staging prod
Rollback
# List recent ECR image tags
aws ecr describe-images \
--repository-name musehub/musehub \
--region us-east-1 \
--query 'sort_by(imageDetails,&imagePushedAt)[-10:].imageTags[0]' \
--output table
# Redeploy a specific tag (skips build+push)
IMAGE_TAG=<previous-tag> bash deploy/push.sh staging
IMAGE_TAG=<previous-tag> bash deploy/push.sh prod
Emergency migration rollback (on instance via SSM)
aws ssm send-command \
--instance-ids i-0855d6efe7fa1a49d \
--document-name "AWS-RunShellScript" \
--parameters 'commands=["cd /opt/musehub && sudo docker run --rm --network musehub_musehub-internal --env-file .env <ecr-image>:<tag> alembic downgrade -1"]' \
--region us-east-1
Backups
No automated backup is configured yet. Planned:
- Daily
pg_dumpcompressed to S3 (or a second EBS snapshot) - Volume snapshot via AWS before every production deploy
- Object store (
musehub_data) is content-addressed — safe to snapshot at any time
Until automated backups are set up, take a manual snapshot before every prod deploy:
# On prod instance
sudo docker compose exec postgres pg_dump -U musehub musehub | gzip > ~/musehub-backup-$(date +%Y%m%d).sql.gz
Costs (approximate, us-east-1, 2025 pricing)
| Item | $/month |
|---|---|
| t3.small (prod) | ~$15 |
| t3.small (staging) | ~$15 (stop when not in use to reduce cost) |
| Elastic IPs (2) | ~$0 while associated, $3.60/mo each if unassociated |
| EBS gp3 20 GB (each) | ~$1.60 |
| Total (both running) | ~$35/mo |
To pause staging when not needed:
aws ec2 stop-instances --region us-east-1 --instance-ids <STAGING_INSTANCE_ID>
# Start again with:
aws ec2 start-instances --region us-east-1 --instance-ids <STAGING_INSTANCE_ID>
The Elastic IP stays associated while the instance is stopped — no charge.
Secrets inventory
All secrets live in /opt/musehub/.env on each instance. Never committed to source.
| Secret | How generated | Rotation |
|---|---|---|
DB_PASSWORD |
openssl rand -hex 16 |
Manual, on compromise |
WEBHOOK_SECRET_KEY |
Fernet key | Manual, on compromise |
RUNNER_TOKEN |
openssl rand -hex 32 |
Manual, on compromise |
Ed25519 identity keys live in ~/.muse/identity.toml on each client machine.
No server-side secret is involved in MSign auth — the public key in the DB is the credential.