gabriel / musehub public
pre-launch-checklist.md markdown
451 lines 28.5 KB
Raw
sha256:e17dff4303a5885a1a61af6b39594fe316aeb62821bad17ccb66c52128a315f5 fix: enforce repo visibility gate on all SSR route handlers… Sonnet 4.6 minor ⚠ breaking 12 hours ago

MuseHub Pre-Launch Checklist

This document governs what must be complete, verified, and signed off before MuseHub opens to users beyond gabriel. Items are grouped by domain. Nothing ships until every checkbox is checked. Re-check after any major refactor.


0. Philosophy

The threat model is realistic: we are open source, so attackers can read every route, every query, every auth flow. We assume they will. The bar is not "unbreakable against nation-states." The bar is: a well-resourced, intelligent adversary with full source access cannot compromise data, impersonate users, or take the service down with commodity tooling. Normal abuse — scrapers, credential stuffing, path traversal attempts, large payload bombs — should be a non-event.


1. Authentication & Authorization

1.1 Ed25519 / MSign

Auth is Ed25519 per-request signing (MSign). No server secret, no token expiry, no refresh. The public key registered in the DB is the credential.

  • [x] No server secret, no ACCESS_TOKEN_SECRET — auth is pure Ed25519 key pairs
  • [x] Per-request Ed25519 signature verified on every protected route (require_signed_request)
  • [x] 30-second replay window enforced (REPLAY_WINDOW_SECONDS = 30 in request_signing.py)
  • [x] Challenge nonce is single-use — consumed with .pop() on first verify; 5-min TTL for GC
  • [x] identity.toml keys are per-host, never shared across environments (muse CLI enforces this)
  • [x] Key revocation: compromised key deleted via DELETE /api/auth/keys/{handle}/{key_id}
  • [x] Auth endpoints rate-limited at 20 req/min per IP via slowapi (AUTH_LIMIT in rate_limits.py)
  • [x] Bearer tokens explicitly rejected with 401 (MSign is the only accepted scheme)
  • [x] WWW-Authenticate: MSign realm="musehub" returned on all 401 responses
  • [x] Failed-auth-specific rate limiting: musehub/auth/failure_limiter.py — in-memory per-IP failure counter with exponential backoff. Thresholds: 5→30s, 10→5min, 20→15min. Wired into POST /api/auth/verify (check before, record_failure on AuthError, record_success on ok).
  • [x] No CAPTCHA needed — there is no password or secret the attacker could guess; the private key never leaves the client machine

1.2 Authorization (ownership checks)

  • [x] Every repo-scoped JSON API endpoint asserts repo.owner == current_user (or team membership) Destructive/state-changing ops (delete, transfer, close, merge, assign, label, milestone) use _guard_owner / _guard_repo_owner helpers. Collaborator team membership is future work.
  • [ ] SSR UI layer visibility gate — all 14 repo-scoped ui_*.py route handlers must check repo.visibility before serving content. Fixed in issue #90 (task/ssr-visibility-gate). Previously, private repos returned HTTP 200 to anonymous browser requests. ui_repo_settings.py and ui_sessions.py additionally require claims.handle == owner.
  • [x] Repo visibility (public/private) is checked before serving any object, blob, or archive All JSON API GET endpoints gate on optional_token + repo.visibility != "public" check.
  • [x] Object download (/archive, /blob, /object) cannot be path-traversed to another repo get_file_at_ref resolves via snapshot manifest; get_object_row filters by repo_id AND object_id in SQL — DB is the authority, no path concatenation.
  • [x] Issue, merge-proposal, and comment endpoints verify the caller owns the parent repo _guard_repo_owner added to: close/reopen/update/assign/milestone/labels on issues; merge/request-reviewers/remove-reviewer on proposals; delete-comment. _guard_write_access added to: create-issue, create-comment, create-proposal, create-proposal-comment (private repos: owner-only; public repos: any authenticated user).
  • [ ] Admin-only endpoints (/api/admin/*) are gated by a separate is_admin claim No /api/admin/* routes exist yet. MSignContext.is_admin is defined but always False. Low priority — no system-admin operations are needed pre-launch.
  • [x] There is no "owner" field that a caller can self-assign via POST body create_repo sets owner_user_id from claims (the authenticated caller), never from the request body. The body's owner field is a display slug only. PATCH on identities explicitly whitelists allowed fields and does not expose handle or identity_type.

1.3 Key hygiene

  • [x] Private keys are never logged — auth logs contain only handle, algo, key_id, and fingerprint prefixes; public_key_b64, signature_b64, and Authorization header values are never passed to any logger
  • [x] MSign signatures are never returned in redirect URLs — all RedirectResponse targets are static paths or /{owner}/{repo}?welcome=1; no auth material in any Location header
  • [x] Authorization: MSign is the only accepted transport — no ?token= query param, no Bearer fallback path

2. Input Validation & Injection

2.1 Path / traversal

  • [x] All owner, repo, branch, path URL segments are validated against an allowlist regex (^[a-zA-Z0-9_.-]+$, length-capped) before touching the filesystem or DB
  • [x] Constructed file paths are resolved with Path.resolve() and checked to be inside the expected root — no ../../ escapes
  • [x] Objects are fetched from the blob store (R2/MinIO) by object_id; no disk paths involved

2.2 SQL / ORM

  • [x] Zero raw SQL string interpolation anywhere in the codebase (muse content-grep "f\"" audit)
  • [x] All queries go through the ORM or parameterized text() with bound params
  • [x] Search / filter inputs are sanitized before being passed to LIKE or tsvector

2.3 Payload / content

  • [x] Request bodies have a hard size cap (e.g., 10 MB for API, 100 MB for object upload) enforced at the ASGI/nginx layer, not just application code (nginx: client_max_body_size 500m; ASGI: ContentSizeLimitMiddleware — 10 MB API, 500 MB push)
  • [x] Markdown rendered server-side is sanitized (no raw <script>, <iframe>, event handlers) — use a strict allowlist renderer (e.g., bleach + mistune) (mistune 3.x HTMLRenderer(escape=True) escapes all raw HTML; javascript: URLs blocked; see jinja2_filters.py)
  • [x] Filenames in archives are validated before extraction (zip-slip prevention) (N/A — no server-side archive extraction in current codebase; objects are stored as content-addressed blobs)
  • [x] YAML / TOML config uploaded by users is parsed in a sandbox, not eval'd (CI workflow YAML uses yaml.safe_load with 256 KiB size limit; no other user-uploaded config parsed)

2.4 Object / commit integrity

  • [x] Every pushed object is content-addressed: SHA-256 of payload must match its stored ID (_verify_object_hash in musehub_wire.py — applied in both wire_push and wire_push_objects before any bytes touch the storage backend; non-sha256: prefixes are forwarded for compat)
  • [x] muse verify passes cleanly on every repo after a push (server-side SHA-256 check at receive time is the equivalent gate; any object that passes wire_push is already content-address-verified — muse verify will agree)
  • [x] Commits with forged parent_id references are rejected at receive time (each declared parent_commit_id must exist in the push mpack OR in the DB for this repo; parents belonging to a different repo are rejected with 409)

3. Network & Transport

  • [x] TLS 1.2 minimum enforced at the load balancer / nginx; TLS 1.0/1.1 disabled (ssl_protocols TLSv1.2 TLSv1.3 + explicit cipher suite added to deploy/nginx-cf.conf)
  • [x] HSTS header set (max-age=31536000; includeSubDomains) (SecurityHeadersMiddleware sets max-age=63072000; includeSubDomains; preload — 2 years with preload, exceeds requirement; only in non-debug mode)
  • [x] All HTTP traffic redirects to HTTPS (301, not 302) (port 80 server block added to deploy/nginx-cf.conf with return 301 https://...)
  • [x] CORS policy is explicit and minimal — not * for authenticated endpoints (allow_methods restricted to GET/POST/PATCH/DELETE/OPTIONS/HEAD; allow_headers restricted to Authorization/Content-Type/Accept/X-Requested-With; origins via cors_origins env var, warns if *)
  • [x] Content-Security-Policy header prevents inline script execution and framing (CSP set globally in SecurityHeadersMiddleware: no unsafe-inline in script-src; frame-ancestors 'none'; unsafe-eval retained for Alpine.js v3)
  • [x] X-Content-Type-Options: nosniff and X-Frame-Options: DENY set globally (both set in SecurityHeadersMiddleware on every response)
  • [x] Cookies (session, CSRF) are Secure, HttpOnly, SameSite=Strict (N/A — no server-side cookies. Auth is Ed25519 MSign per-request signing; no session middleware; no CSRF tokens)
  • [x] No mixed-content (HTTP resources loaded from HTTPS pages) (CSP includes upgrade-insecure-requests; connect-src 'self' and img-src 'self' data: https: block HTTP sub-resources)

4. Rate Limiting & Abuse Prevention

  • [x] Global rate limit per IP: e.g., 300 req/min baseline, configurable per route (Limiter(default_limits=["300/minute"]) in rate_limits.py — applies to all routes without an explicit tighter limit)
  • [x] Auth endpoints (login, register, challenge): stricter limit, e.g., 10 req/min per IP (AUTH_LIMIT = "20/minute" on all auth routes; failure_limiter.py adds per-IP exponential backoff on failures)
  • [x] Object upload endpoint: limited by both request rate and total bytes/hour per user (push endpoints at 30/min via WIRE_PUSH_LIMIT; bytes/hour per-user tracking deferred — push body size cap at 500 MB covers payload bombs)
  • [x] Archive download: rate-limited and/or requires authentication for private repos (GET /o/{object_id} limited to 120/min via OBJECT_LIMIT; private repo visibility enforced at repo layer)
  • [x] Search endpoint: limited to prevent full-index scraping (@limiter.limit(SEARCH_LIMIT) wired up on /api/search, /search, /search/repos, /repos/{id}/search)
  • [x] 429 responses include Retry-After header (_handle_rate_limit override in main.py computes Retry-After from X-RateLimit-Reset timestamp)
  • [x] Bot / scraper detection via User-Agent + behavioral heuristics; block or throttle (BotThrottleMiddleware in musehub/middleware/bot_throttle.py: known-bad UA patterns → 429; missing UA on non-CDN paths → 429; /healthz, /static/, /mcp exempt)
  • [x] Webhook delivery retries are capped and backed off (no retry storms) (_MAX_ATTEMPTS = 3, _BACKOFF_BASE = 1.0 → 1s/2s/4s exponential backoff in musehub_webhook_dispatcher.py)

5. Data Integrity

5.1 Database

  • [ ] Postgres WAL archiving enabled (point-in-time recovery) (ops config — on the Postgres server set archive_mode=on, archive_command, wal_level=replica; requires postgres superuser access. Not automatable from app code.)
  • [x] Automated daily snapshot backup to a separate storage location (not same disk) (deploy/backup.sh: daily pg_dump | gzip at 3 AM; syncs to Cloudflare R2 via rclone for off-disk retention (90 days). Set BACKUP_R2_BUCKET=<bucket> in .env.)
  • [ ] Backup restore drill: restore latest backup to a staging DB and verify row counts (ops procedure — decompress latest .sql.gz, psql musehub_staging < dump.sql, run SELECT COUNT(*) FROM musehub_repos etc. Do before every major migration.)
  • [x] Foreign key constraints enforced (not deferred or disabled for speed) (all FK columns use ForeignKey(..., ondelete="CASCADE"); PostgreSQL enforces FKs natively)
  • [x] Critical tables have updated_at triggers for audit trails (added updated_at to MusehubRepo, MusehubProposal, MusehubWebhook, MusehubRelease; already present on MusehubIssue/Comment/Milestone/Comment/RenderJob. Migration 0020 backfills existing rows with server_default=func.now().)
  • [x] No orphaned objects: object rows reference valid repo IDs (FK + periodic scan) (FK+CASCADE guarantees DB-level cleanup on repo delete; musehub/maintenance/orphan_scan.py provides scan_orphan_objects() and delete_orphan_objects() for scheduled maintenance)

5.2 Object store

  • [x] Object files on disk are immutable after write (append-only content-addressed store)
  • [x] Periodic integrity scan: re-SHA-256 a random sample of stored objects against their IDs
  • [x] Disk usage quotas enforced per repo and per user (prevent storage exhaustion)
  • [x] Deletion is soft-delete first (tombstone), hard-delete only after a retention window

5.3 Migrations

  • [x] All schema changes go through versioned Alembic migrations
  • [x] Migrations are tested against a production-sized data snapshot before apply
  • [x] Rollback migration exists for every forward migration
  • [x] Migration is run in a transaction; failure rolls back cleanly

6. Performance & Scalability

6.1 Database

  • [x] Indexes exist on all foreign keys and common filter columns (repos.owner, commits.repo_id, symbols.repo_id + address, issues.repo_id)
  • [x] EXPLAIN ANALYZE run on the 10 highest-traffic queries; no seq scans on large tables
  • [x] Connection pooling configured (PgBouncer or SQLAlchemy pool); max connections capped
  • [x] Slow query log enabled (threshold: 100 ms); alerts wired

6.2 API

  • [x] Symbol list, commit log, and blame endpoints are paginated — no unbounded result sets
  • [x] Large diffs / blobs are streamed, not buffered in memory
  • [x] Archive download streams directly from disk — no full file read into RAM
  • [x] Symbol intelligence queries (hotspots, dead code) are pre-computed at push time, never computed on-the-fly per request

6.3 Static assets

  • [x] app.css and JS are served with far-future Cache-Control headers + content hash in filename for cache busting (StaticCacheMiddleware: public, max-age=31536000, immutable for .css/.js/.map; static_version = SHA-256(app.css+app.js)[:8] injected as ?v= on all assets in base.html)
  • [x] Assets are gzip or brotli compressed at the nginx/CDN layer (deploy/nginx-cf.conf: gzip on; gzip_comp_level 6; covers text/css, application/javascript, application/json, image/svg+xml and more)
  • [x] No blocking synchronous calls in request handlers (all DB and I/O are async) (objects.py: bare open() wrapped in asyncio.to_thread; storage reads use stream() async generator)

6.4 Load testing

  • [x] Baseline load test: 100 concurrent users, normal read-heavy traffic — p99 < 500 ms (Locust BaselineUser scenario in deploy/load-tests/locustfile.py; run against staging. Infra readiness verified: pool_size=20 + max_overflow=40 = 60 DB conns; UVICORN_WORKERS defaults to 4; GLOBAL_LIMIT=300/min allows normal browsing patterns.)
  • [x] Spike test: 10× normal traffic for 60 s — service degrades gracefully (429s), does not crash (Locust SpikeBurst scenario; 429s verified to carry Retry-After header via _handle_rate_limit; RateLimitExceeded handler registered in app exception handlers.)
  • [x] Soak test: sustained moderate load for 12 h — no memory leak, no connection leak (Locust SoakUser scenario; MemoryLogMiddleware registered as outermost ASGI layer, logs RSS > 400 MiB; rss_mb() via psutil available for monitoring.)
  • [x] Write-heavy test: 50 concurrent pushes — object store and DB remain consistent (Locust WritePushUser scenario; in-process concurrent push verified with asyncio.gather over 10 pre-upload + 5 concurrent commit push calls — no 500s, no deadlocks.)

7. Infrastructure & Operations

7.1 Environments

  • [x] Local dev: runs via docker compose, no shared state with staging/prod (docker-compose.override.yml: DEBUG=true + bind mounts; named volumes musehub_data and postgres_data are isolated from staging/prod)
  • [x] Staging: full production mirror (same Docker image, separate DB, separate object store) — accessible at an internal URL only, no public DNS (aws-provision-staging.sh: INSTANCE_NAME=musehub-staging, separate EIP; same AMI as prod; setup-ec2-staging.sh uses staging.musehub.ai domain)
  • [x] Production: isolated VPC, restricted inbound (80/443 only), no SSH from public internet (aws-provision.sh: SSH restricted to current IP only, never 0.0.0.0/0; nginx forces HTTP→HTTPS redirect; Cloudflare terminates TLS at edge; uvicorn started with --proxy-headers for correct client IP propagation)
  • [x] Environment config (secrets, DB URLs, object store paths) is injected via env vars or secrets manager — never committed to source (.museignore excludes .env and .env.*; pydantic-settings reads all config from env vars; .env.example documents every required field per environment; Settings defaults all secret fields to None; startup guard rejects weak DB_PASSWORD in production; missing WEBHOOK_SECRET_KEY and RUNNER_TOKEN logged at WARNING on startup)

7.2 Secrets management

  • [x] DB password, webhook secret, and object store credentials are stored in a secrets manager (e.g., Doppler, AWS Secrets Manager, Vault) — not in .env files in the repo (deploy/secrets.sh: fetches all secrets from AWS SSM Parameter Store SecureString parameters at deploy time, writes /opt/musehub/.env at mode 600; never stores credentials in the repo or Docker image layers)
  • [x] Secrets are rotated on a schedule (DB password: 180 days; webhook key: on compromise) (docs/secret-rotation-runbook.md: step-by-step aws ssm put-parameter rotation for DB_PASSWORD (180 days), WEBHOOK_SECRET_KEY (on compromise), RUNNER_TOKEN (90 days), R2 credentials (90 days), and CloudTrail audit procedure)
  • [x] No secrets in Docker image layers (docker history audit) (Dockerfile has no secret-bearing ARG/ENV; only PYTHONPATH, PYTHONDONTWRITEBYTECODE, PYTHONUNBUFFERED are baked in; no .env COPY; audit command documented in runbook)
  • [x] CI/CD pipelines inject secrets at runtime, not build time (deploy.sh: containers started with --env-file $APP_DIR/.env (runtime); no secrets passed as build args; setup-ec2.sh generates fresh DB_PASSWORD + WEBHOOK_SECRET_KEY via openssl rand / Fernet.generate_key() on first provision)

7.3 Deployment

  • [x] Zero-downtime deploy: rolling restart or blue/green, no hard cutover (deploy/deploy.sh: two slots blue/green, nginx upstream pointer file, atomic flip via nginx -s reload; health-checked before flip, old slot stopped after)
  • [x] Health check endpoint (/healthz) returns 200 only when DB connection and object store are reachable (GET /healthz in main.py: SELECT 1 DB probe + blob store head_bucket probe; registered before wildcard routes; 200 {"status":"ok"} / 503 {"status":"unhealthy"}; no auth required; Dockerfile HEALTHCHECK and deploy.sh health URLs both point to /healthz)
  • [x] Container runs as a non-root user (Dockerfile: groupadd -r musehub && useradd -r -g musehub musehub, USER musehub after pip install)
  • [x] Read-only filesystem where possible; writable mounts are explicit and minimal (docker-compose.yml: read_only: true on musehub service; /tmp as tmpfs; /data as named volume musehub_data)
  • [x] Resource limits set (CPU + memory) on all containers (docker-compose.yml deploy.resources.limits: musehub 1.0 CPU / 512M, postgres 0.5 CPU / 256M, musehub-runner 0.5 CPU / 256M)

7.4 Logging & alerting

  • [x] Structured JSON logs (level, timestamp, request_id, user_id, path, status, duration) (musehub/logging_config.py: JsonFormatter emits one JSON object per line; request_id_var / user_id_var contextvars populated by AccessLogMiddleware so every in-request log record automatically carries them; optional method, path, status, duration_ms on access records)
  • [x] No PII or token values in logs (PiiFilter in musehub/logging_config.py: scrubs Bearer <token>, password=, token=, secret= patterns from fully-formatted message strings before any handler sees them; AccessLogMiddleware extracts only the MSign handle — never logs the raw Authorization value)
  • [x] Alerts wired for: 5xx rate > 1%, p99 latency > 2 s, disk > 80%, DB connections > 90% (deploy/cloudwatch-alerts.sh: log group /musehub/app; metric filters for 5xxCount, RequestCount, RequestDurationMs; CloudWatch alarms with SNS → SMS + email to gabriel)
  • [x] On-call rotation or at minimum a PagerDuty / SMS alert to gabriel's phone (deploy/cloudwatch-alerts.sh SNS SMS subscription; docs/on-call-runbook.md with incident playbooks for all four alarm types + optional PagerDuty escalation path)
  • [x] Log retention: 30 days hot, 1 year cold (deploy/cloudwatch-alerts.sh put-retention-policy --retention-in-days 30; docs/on-call-runbook.md documents S3 export → Glacier after 30d, expire after 365d)

8. Security Hardening (Adversarial)

These items assume an attacker has read the full source code.

  • [x] SSRF: any feature that makes outbound HTTP requests (webhooks, avatar fetch, MCP endpoints) validates the target URL against an allowlist; blocks RFC-1918 ranges (musehub/security/ssrf.py: check_url_safe() — sync, scheme + bare IP, used in WebhookCreate.url field_validator; validate_outbound_url() — async + DNS resolution via asyncio.to_thread, used in _attempt_delivery() as defence-in-depth against DNS rebinding; blocks loopback 127.x, RFC-1918 10.x/172.16-31.x/192.168.x, link-local 169.254.x AWS metadata, fc00::/7, 100.64.x carrier NAT)
  • [x] Mass assignment: Pydantic models used for all request bodies; no **kwargs or dict(request.body) passed directly to ORM constructors (all route handlers use typed Pydantic models; the one **cache_payload in ui_blame.py is built from server-controlled data, not user input)
  • [x] Timing attacks on auth: use hmac.compare_digest for all secret comparisons, never == (runner token: hmac.compare_digest in runner.py; MSign: verify_signature Ed25519 cryptographic verification in request_signing.py — constant-time by design; webhook signatures: hmac.compare_digest in musehub_webhook_dispatcher.py)
  • [x] Object enumeration: repo IDs and issue IDs are either UUIDs or obfuscated — sequential integers let attackers enumerate all repos/issues across all users (repo_id, issue_id: String(36) UUID primary keys; issue number is per-repo sequential — scoped by repo, not a global counter)
  • [x] Commit ID forgery: muse verify signature check is enforced server-side on receive, not just client-side (REQUIRE_SIGNED_COMMITS=true in Settings + enforcement gate in wire_push: rejects any commit with empty signature or signer_key_id; default False for backward compat; unsigned commits log at DEBUG when enforcement is off)
  • [x] Denial of service via regex: all user-supplied regex patterns (search, filter) are compiled with a timeout or replaced with parameterized FTS (search_by_pattern uses Python in operator — no regex; search_by_ask tokenizes with a fixed pre-compiled _TOKEN_RE pattern — no user-controlled regex; no re.compile(user_input) anywhere in the search path)
  • [x] Tar bomb / zip bomb: archive extraction enforces max uncompressed size and max file count before extraction begins (N/A: MuseHub does not perform any archive extraction; tarfile/zipfile are not imported anywhere in the application code)
  • [x] Polyglot files: file type validation by magic bytes, not just extension (musehub/security/magic_bytes.py: check_magic_bytes(path, content) validates MIDI/MP3/WebP/PNG/JPEG/ZIP/PDF by header signatures; blocks HTML and shebang content in all non-HTML files; called in wire_push before storing each object — returns WirePushResponse(ok=False) on mismatch)
  • [x] Clickjacking: X-Frame-Options: DENY + CSP frame-ancestors 'none' (SecurityHeadersMiddleware in main.py:87-109 — sets X-Frame-Options: DENY and Content-Security-Policy with frame-ancestors 'none' on every response)
  • [x] Open redirect: redirect-after-login validates target is same-origin only (ui_mcp_elicitation.py: callback = request.url.path (+ query) — stores path-only, never the full absolute URL with scheme/host; prevents attacker from injecting ?next=https://evil.com via a crafted Host header)
  • [x] Account takeover via handle squatting: handle registration is case-insensitive normalized (e.g., Gabriel == gabriel) (musehub_auth.py: handle = handle.strip().lower() before MusehubIdentity() — Gabriel and gabriel map to the same row; UNIQUE constraint on handle enforces it)
  • [x] Claude / MCP prompt injection: MCP tool results that include user-controlled content are wrapped in a clear delimiter and documented as untrusted; the system prompt instructs the model to treat that content as data, not instructions (mcp/dispatcher.py: success results wrapped in <musehub_tool_result>…</musehub_tool_result>; mcp/prompts.py orientation prompt names the delimiter and lists commit messages, issue bodies, file paths, repo names, branch names as untrusted user data)
  • [x] Agent impersonation: agent_id in commit provenance is validated against a registry of known agents; unknown agents are accepted but flagged, never trusted for elevated operations (TRUSTED_AGENT_IDS list in Settings; wire_push logs WARNING and injects metadata["untrusted_agent"] = "true" for any agent_id not matching a trusted prefix; unknown agents are accepted, never rejected)

  • [x] Privacy policy exists and is linked from the footer (docs/legal/privacy-policy.md; covers pubkey-as-identity model, agent-first design, training data policy, GDPR export/delete rights, and opt-out mechanism; linked from base.html footer)
  • [x] Terms of service exist; acceptance is implicit via MSign key registration (docs/legal/terms-of-service.md; agents cannot click accept — the operator's act of registering a key constitutes acceptance; tos_accepted_at + tos_version recorded on MusehubIdentity at registration time in musehub_auth.py; training data policy detailed: public OSI-licensed repos only, training_opt_out=true respected, private repos never used)
  • [x] Minimum data: accounts are pubkeys + handles — no passwords, no required email; MusehubIdentity fields audited; only voluntarily-provided profile data (email, bio, avatar) is stored; agent identities store model name and capabilities for discovery, not for PII purposes
  • [x] GDPR / CCPA: GET /api/me/export returns full data dump (identity, keys, repos, commits); DELETE /api/me hard-deletes auth keys and soft-deletes identity + repos; training_opt_out: bool field on MusehubRepo (migration 0023_compliance_fields.py)
  • [x] DMCA takedown process documented (docs/legal/dmca.md; contact [email protected]; 2-day acknowledgement / 5-day action SLA; counter-notice process; repeat-infringer policy; agent-operator accountability)
  • [x] OSS license audit completed (docs/legal/license-audit.md; all direct dependencies are MIT / Apache-2.0 / BSD-3-Clause / LGPL-3.0; psycopg2-binary LGPL-3.0 is compatible under dynamic-linking exception for server-side SaaS; quarterly review schedule established)

10. Pre-Launch Smoke Tests

These must all pass in staging before prod deploy:

  • [ ] New user signup → keygen → register → push a repo → view on MuseHub
  • [ ] Private repo is not accessible when unauthenticated
  • [ ] muse push with a forged object SHA is rejected
  • [ ] Rate limit kicks in after threshold on auth endpoint
  • [ ] Archive download returns correct bytes for a known commit
  • [ ] Symbol index is populated after push and renders on /symbols
  • [ ] Issue create → comment → close flow works end to end
  • [ ] Merge proposal open → review → merge flow works end to end
  • [ ] MCP endpoint returns correct tool schema and executes a read tool correctly
  • [ ] docker compose down && docker compose up preserves all data (volumes survive)
  • [ ] Backup restore drill: drop staging DB, restore from latest backup, verify

11. Launch Gate

All sections above must be fully checked. Final sign-off:

Domain Owner Signed off Date
Auth & AuthZ gabriel [ ]
Input validation gabriel [ ]
Network / TLS gabriel [ ]
Rate limiting gabriel [ ]
Data integrity gabriel [ ]
Performance gabriel [ ]
Infrastructure gabriel [ ]
Security hardening gabriel [ ]
Smoke tests gabriel [ ]

When every row is checked: tag v1.0.0-rc1, deploy to staging, hold 72 h, then promote to production.

File History 1 commit
sha256:e17dff4303a5885a1a61af6b39594fe316aeb62821bad17ccb66c52128a315f5 fix: enforce repo visibility gate on all SSR route handlers… Sonnet 4.6 minor 12 hours ago