# Companion App β€” Phase 5: Bind Gate (sockets Β· spawn Β· keychain Β· download) **Status:** 🧠 Thinking design gate β€” **RATIFICATION REQUESTED.** This document makes design decisions only. **No companion shell code is written or approved to run by this document.** It fixes the contract under which the Phase 5 implementation may, for the first time, open a real socket, spawn a real process, read the real OS keychain, and perform a real TLS download. **Branch:** `feat/companion-app` (Muse-canonical; paired with the Phase 1–4 code already on this branch β€” **not** a docs-only PR to `main`, per the owner's no-docs-only-PR-to-`main` policy). **Phase table ref:** Gate Β§12, Phase 5 ("companion app shell, integrating phases 2–4"). The phase table marks the *implementation* ⚑ Sonnet/auto; this **bind-gate design** is elevated to 🧠 Thinking because it is the first phase that performs ambient-authority-bearing I/O, and a wrong seam here is an ambient-authority vulnerability, a supply-chain compromise, or an OS-permission overreach. Mirrors the πŸ”€ Hybrid pattern: design the seam with a thinking model, implement against the fixed contract with Sonnet/auto. **Depends on (all accepted):** [`COMPANION-APP-DESIGN-AND-AUTHORIZATION-GATE.md`](COMPANION-APP-DESIGN-AND-AUTHORIZATION-GATE.md) (Β§4 the eight loopback controls, Β§4.6 no ambient authority, Β§7 packaging, Β§12 phase table, the "DOES NOT approve" list); Phase 0 (gate Β§13, D1–D3); Phase 1 ([`COMPANION-APP-PHASE-1-ADAPTER-SEAM.md`](COMPANION-APP-PHASE-1-ADAPTER-SEAM.md), `lib/model-runtime-lane.mjs`); Phase 2 ([`COMPANION-APP-PHASE-2-LOOPBACK-SECURITY.md`](COMPANION-APP-PHASE-2-LOOPBACK-SECURITY.md), `lib/companion-loopback-guard.mjs`); Phase 3 ([`COMPANION-APP-PHASE-3-OAUTH-PKCE.md`](COMPANION-APP-PHASE-3-OAUTH-PKCE.md), `lib/companion-oauth-pkce.mjs`, `lib/companion-token-custody.mjs`); Phase 4 ([`COMPANION-APP-PHASE-4-RUNTIME-MANAGER.md`](COMPANION-APP-PHASE-4-RUNTIME-MANAGER.md), `lib/companion-runtime-manager.mjs`); the **server-side OAuth gate** βœ… DONE ([`COMPANION-APP-OAUTH-SERVERSIDE-GATE.md`](COMPANION-APP-OAUTH-SERVERSIDE-GATE.md), `hub/gateway/native-oauth-provider.mjs`). --- ## Simple summary Until now, every piece of the companion app has been built as **pure rules with no hands** β€” it can *decide* whether a request is safe, whether a sign-in is valid, whether a downloaded model is genuine, and whether the runtime may serve a request, but it cannot actually open a door, start a program, read your password vault, or download a file. This phase is where the rules **grow hands**. That is the single most dangerous step in the whole project, so this document does not write any of that code. It writes the **rulebook the hands must obey** β€” and argues every rule against a specific attacker: - **The two doors** (one for local AI requests, one for the brief sign-in reply) open on a *random* internal number, on your machine only, never to the outside network. - **The password vault** (OS keychain) is touched through the narrowest possible opening: store one secret, read one secret, delete one secret β€” nothing that can list or dump everything. - **Starting the AI program** is done in a locked-down way: exact program, no shell, a stripped-down environment so the AI program can never see your sign-in token or vault, and it dies when the companion dies. - **Downloading a model** checks the file's fingerprint against a trusted record *before* the program is ever run on it, and the trusted record comes from somewhere the download server can't forge. - **Checking how much memory/GPU is in use** is scoped to the AI program only, and must never spy on what your other apps are doing. - The "**AI is ready on this device**" switch only flips on after the file is verified, the program is genuinely up, and a real round-trip check succeeds β€” and flips off the instant anything is wrong. - Finally: the parts with hands (start a program, download a file) are physically built so they can **never** hold your sign-in token, your vault handle, or the keychain β€” they only get a file path, a port, and a URL. ## Technical summary Phases 1–4 and the server-side OAuth gate delivered every **decision core** and the one server-side route (`/api/v1/auth/native`) the companion needs. Each pure module deferred its real I/O to "Phase 5 behind an explicit gate." This document **is** that gate. It ratifies eight decisions (D5.1–D5.8) covering the four I/O-bearing seams that converge in the companion shell: **socket bind** (the Phase 2 inference listener and the Phase 3 OAuth loopback redirect listener), the **OS-keychain adapter** (custody of the JWT, refresh token, and Phase 2 loopback token), the **spawn adapter** (the bundled Ollama/llama.cpp runtime), the **download adapter** (TLS model fetch wired into Phase 4's `createIntegrityAccumulator`), and the **resource-probe adapter** β€” plus the **Phase 1 seam activation** rule (`companionAvailable`) and the **no-ambient-authority enforcement mechanism**. Every decision defaults **fail-closed**: any ambiguity at bind, custody, spawn, download, or probe denies rather than proceeds. No secret (loopback token, JWT, refresh token, authorization code, `code_verifier`, `SESSION_SECRET`, model digest) appears in any log, error, adapter interface, or redirect. The implementation that follows this gate runs **from source on a first-party machine** and must ship its own 7-tier suite for the bind/lifecycle layer before any merge to `main`; **packaging, code-signing, notarization, and auto-update remain Phase 7** and are *not* approved here. --- ## 0. What this gate lifts β€” and what it deliberately does not The design gate's ["DOES NOT approve (no code)"](COMPANION-APP-DESIGN-AND-AUTHORIZATION-GATE.md) list named, among others, *"opening any new local HTTP listener / loopback model endpoint"* and *"shipping any companion binary … or bundled runtime."* Phase 5 is the gate that lifts a **bounded subset** of that list β€” and only that subset. | Item from the design gate's "DOES NOT approve" list | Phase 5 disposition | | --- | --- | | Opening a new local HTTP listener / loopback model endpoint | **LIFTED** for the **two loopback listeners** specified in D5.1/D5.2 (inference + OAuth redirect), bound `127.0.0.1`/`[::1]` only, under the Phase 2 guard. | | Running an integrated companion process that binds sockets, spawns the runtime, reads the keychain, downloads a model | **LIFTED** for **first-party run-from-source** on the owner's/developer's machine, under D5.1–D5.8. | | Shipping a companion **binary**, tray installer, **auto-updater**, code-signing/notarization | **NOT lifted** β€” remains **Phase 7** (distribution gate). Phase 5 runs from source; it does not produce or distribute a signed artifact. | | New canister routes / new Hub REST endpoints / wire-protocol changes | **NOT needed** β€” the one required server route (`/api/v1/auth/native`) was delivered by the server-side OAuth gate (βœ… DONE). Phase 5 adds none. | | New derived-artifact storage paths / encryption scheme | **NOT lifted** β€” remains **Phase 6**. Phase 5 performs **inference**; enrichment write-back storage is Phase 6. | | Any change to OAuth client registration or scopes | **NOT lifted / already bounded** β€” the server-side OAuth gate fixed the native client + scope ceiling (`scopesForRole`). Phase 5 only *consumes* it as a client; it changes nothing server-side. | **Net:** Phase 5 means "the rules may now have hands, on a first-party machine, under the contract below." It does **not** mean "ship a product." Distribution is a separate, later gate. --- ## 1. Adversarial threat model (the bind surface) Phases 2 and 3 modelled the *request* and the *protocol*. Phase 5 adds the **physical I/O**, so the threat model expands to the host itself. Each attacker is paired with the **exact Phase 5 control** that stops it, argued against the attacker β€” not pattern-matched. | # | Attacker capability | Exact Phase 5 control | Decision | | --- | --- | --- | --- | | **P-a** | **Malicious web page** issues `fetch()` to the companion's loopback listener (and to the runtime's internal port). | Phase 2 guard on the front-door (token + `Host` + `Origin`/`Sec-Fetch-Site` + no permissive CORS); the runtime's internal port emits **no CORS** and carries **no authority** (a no-cors POST can spend compute but can never read a response cross-origin or reach data). | D5.1, D5.4 | | **P-b** | **DNS-rebinding** to reach the loopback bind. | Loopback-only bind (`127.0.0.1`/`[::1]`, never `0.0.0.0`/`::`) + Phase 2 `Host` allowlist. Ephemeral port is *not* relied on as a control. | D5.1 | | **P-c** | **Local same-user malware** connects directly to the listener or the runtime's internal port, bypassing the front-door. | Per-session bearer token on the front-door (Phase 2); the runtime back-end is reachable only via a **Unix-domain socket (mode `0600`)** where supported, else a loopback TCP port that carries **inference only, no authority and no secret** β€” so direct access spends local compute but exfiltrates nothing. Rate-limited. | D5.1, D5.4, D5.8 | | **P-d** | **Local process races to bind** the OAuth redirect port / inference port, or steals it after the companion exits. | OS-assigned ephemeral port chosen per use; redirect listener is **one-shot** (bound only for one auth attempt, closed after one callback); no `SO_REUSEPORT`; stale-runtime reclaim on start; loopback token rotated each start so a stale listener is **inert** against the new session. | D5.1, D5.2, D5.4 | | **P-e** | **Keychain read** by another same-user process (or a backup/cloud-sync exfil). | Narrowest adapter surface (`get`/`set`/`delete` on four fixed accounts β€” no enumerate/dump); macOS `…ThisDeviceOnly` accessibility (**no iCloud sync**); ACL bound to the app where the OS supports it; the loopback token is **separate, per-session, rotated** so its compromise is not the JWT's compromise. Linux Secret Service's weaker same-user isolation is documented as a known platform limitation. | D5.3 | | **P-f** | **Supply-chain**: poisoned model file from a compromised CDN/mirror or MITM (valid TLS, malicious bytes). | Phase 4 `createIntegrityAccumulator`: SHA-256 + exact size, **constant-time** digest compare, **before** spawn. The trust anchor (`expectedDigest`) comes from a **first-party signed manifest**, *not* the model host β€” so the entity that controls the download cannot also forge the digest. Atomic tempβ†’verified move; spawn only from the verified path. | D5.5 | | **P-g** | **Escaped/orphaned runtime** keeps running, holding a port or compute, after the companion crashes; or a **poisoned runtime** attempts RCEβ†’data exfil. | Spawn in the companion's process group (dies with it); stale-runtime detect-and-kill on start; **scrubbed environment** (no `SESSION_SECRET`/JWT/keychain in the child env); no shell, argv array, absolute binary path; the runtime is handed **only** a file path + port + resource flags β€” **never** a vault/JWT/keychain handle. | D5.4, D5.8 | | **P-h** | **Resource-probe privacy leak**: GPU telemetry exposes *other* apps' processes/VRAM. | Probe is scoped to the **runtime's own PID**; VRAM read as **aggregate headroom only**; other processes' PIDs/names/usage are **never enumerated, logged, or persisted**; no privilege escalation for telemetry (skip VRAM rather than escalate). | D5.6 | | **P-i** | **Premature lane selection**: the local lane is chosen against a runtime that is downloading, starting, dead, or unverified. | `companionAvailable` flips true **only** after integrity verified **and** lifecycle `ready` **and** a real health round-trip succeeded, backed by a recency bound; flips false immediately on drain/stop/health-fail/crash. Fail-closed default false. | D5.7 | | **P-j** | **Ambient-authority pivot**: the spawn/download/probe path acquires a vault handle, the JWT, or keychain capability and exfiltrates. | Object-capability segregation: the **authority group** (keychain + OAuth/session + canister) and the **runtime group** (spawn/download/health/stat) are constructed in disjoint scopes with no shared reference; enforced by an architecture/import test + a child-env-scrub test. | D5.8 | | **P-k** | **Secret in logs/errors** at any I/O boundary (the new, dangerous surface). | All adapters log fixed reason codes only; never the token, JWT, refresh token, code, `code_verifier`, digest, URL, or model path. Reuses Phases 2–4's "no secret in output" posture at the I/O layer. | all | --- ## 2. Decision D5.1 β€” Inference loopback socket bind contract > Question: what port-selection strategy for the loopback inference listener, and what is the threat > model for port fixation? ### Verified state Phase 2 Β§6 already specifies the bind obligations the guard cannot enforce by itself: `listen(0, '127.0.0.1')`, never `0.0.0.0`; non-predictable ephemeral port; build `allowedHosts` from the actual bound port; never emit permissive CORS. Phase 4 Β§5.2 repeats the loopback-bind + ephemeral-port requirement for the runtime. The Phase 2 guard (`verifyLoopbackRequest`) is the admission control; the bind is the deferred I/O. ### Decision β€” **OS-assigned ephemeral port, loopback-only, port secrecy is NOT a control** 1. **Bind `listen(0, '127.0.0.1')`** β€” the OS assigns an ephemeral port. If IPv6 loopback is offered, bind `[::1]` **explicitly**; **never** bind `0.0.0.0`, `::`, or any routable interface. A bind that can only be satisfied on a non-loopback interface is a **hard startup abort** β€” never a fallback to a broader interface. 2. **Port fixation threat model (confirmed):** an ephemeral port raises the cost of *blind* probing, but a local process can scan the ~16k-wide ephemeral range in milliseconds, and a web page can fire no-cors requests across it. Therefore the ephemeral port is **defense-in-depth only**; the real controls are the Phase 2 bearer token + `Host` allowlist + `Origin`/`Sec-Fetch-Site` check + rate limit. **The design must never treat the port as a secret or as an access control.** (This is why a *fixed* well-known port is rejected: it gives an attacker a free, stable target and a fingerprint, for zero security benefit.) 3. **Exclusive ownership:** do **not** set `SO_REUSEPORT` (it would let a second local process bind the same port and split/hijack traffic). The listener owns its port for the session. 4. **Port persistence:** the bound port may be recorded for the local session only, in a **user-private** location (file mode `0600` / per-user keychain-adjacent store), never world-readable and never logged at info level. It is **not** a secret, but minimizing its broadcast is free defense-in-depth. 5. **Two-listener separation (critical, see D5.4):** the *guarded front-door* the companion exposes is distinct from the *runtime's own internal listener*. Both are loopback; the front-door enforces the Phase 2 guard, the back-end carries no authority. Direct access to the back-end must never bypass an authority boundary (it bypasses only the rate limiter and serves inference β€” acceptable because the back-end holds no secret and no data path; see D5.4/D5.8). ### Fail-closed Cannot bind loopback β†’ abort startup (no listener, `companionAvailable` stays false). Port-store write fails β†’ continue (the port is not a secret) but never broaden the bind to compensate. --- ## 3. Decision D5.2 β€” OAuth redirect loopback listener bind > Question: same ephemeral strategy for the PKCE loopback redirect? Confirm it can be a **different** > port from the inference socket. ### Verified state Phase 3 Β§6 requires the redirect listener bound `listen(0, '127.0.0.1')` (or `[::1]`), the `redirect_uri` constructed from the actual bound port and re-validated by `validateRedirectUri`, and the `state` discarded after one callback. The server-side gate (βœ… DONE) confirmed the native provider accepts any loopback ephemeral port at registration and exact-matches it within one flow (RFC 8252 Β§7.3 + RFC 6749 Β§4.1.3); `native-oauth-provider.mjs` `isLoopbackUri` enforces `127.0.0.1`/`[::1]` literals only and rejects `localhost`. ### Decision β€” **Separate, short-lived, one-shot ephemeral redirect listener** 1. **Different listener, different port.** The OAuth redirect listener is a **distinct** server from the inference listener, bound independently via `listen(0, '127.0.0.1')` (or `[::1]`). They are different listeners with different lifecycles, so they get different ports β€” confirmed and required. Confusing them (or reusing the inference port for the redirect) is forbidden. 2. **One-shot lifecycle.** The redirect listener is bound **only** for the duration of a single authorization attempt and is **torn down immediately** after it receives exactly one callback (or on timeout/abort). This minimizes the window in which a local process (P-d) can race the port. PKCE already makes an intercepted code useless (Phase 3 threat **a**), but a one-shot listener also shrinks the race surface. 3. **Strict request handling.** Accept only `GET` on the exact registered callback path; hand the query params to `validateAuthorizationResponse({ params, expectedState, expectedIssuer })`. The native provider now emits `iss` (server-side gate C3), so Phase 5 **MUST pass `expectedIssuer`** and treat a mismatch as fatal (full mix-up defense, Phase 3 threat **c**). No bearer-token check here β€” this listener authenticates by `state` + `iss` + PKCE, not by the loopback token. 4. **Per-attempt secrets in memory only.** `code_verifier`, `state`, and `nonce` live in memory for the attempt; `state` is single-use and discarded after one callback. Never written to disk, never logged. 5. **System browser only.** The authorization URL (from `buildAuthorizationUrl`) is opened in the OS default browser; an embedded webview is forbidden (RFC 8252 Β§8.12, Phase 3 threat **i**). ### Fail-closed Redirect listener cannot bind loopback β†’ abort the auth attempt (no browser launch). Callback fails `validateAuthorizationResponse` (bad `state`/`iss`/error) β†’ abort, generic message, never log the raw callback. Timeout with no callback β†’ tear down the listener and surface a generic failure. --- ## 4. Decision D5.3 β€” OS-keychain adapter surface > Question: the minimal keychain API the companion needs; per-OS backends; what does a keychain read > grant? ### Verified state Phase 3 custody (`companion-token-custody.mjs`) is written against an **injected** `{ get, set, delete }` adapter over four fixed accounts (`KEYCHAIN_ACCOUNTS`: `accessToken`, `refreshToken`, `sessionMeta`, `loopbackToken`), with `MAX_SECRET_LEN` bounds and fail-closed loads. The adapter calls are awaited (sync or Promise both work). Phase 5 supplies the real backend. ### Decision β€” **Exactly `get`/`set`/`delete` on four named accounts; nothing wider; device-local** 1. **Minimal surface.** The real adapter implements **only** `get(account)`, `set(account, secret)`, `delete(account)`. **No `list`, no `enumerate`, no `getAll`, no wildcard/prefix query, no "dump."** A compromised or buggy adapter cannot discover what it does not already name. Unknown account names are **rejected fail-closed** (the adapter accepts only the four `KEYCHAIN_ACCOUNTS` literals); enforce `MAX_SECRET_LEN`; never log or return a secret in an error. 2. **Per-OS backends (least-privilege, device-local):** - **macOS β€” Keychain Services** generic-password items, accessibility **`kSecAttrAccessibleWhenUnlockedThisDeviceOnly`** (no iCloud Keychain sync β€” the JWT/refresh token must **not** leave the device), with the item ACL bound to the companion's signed code identity where available (tightened further by Phase 7 signing). - **Windows β€” DPAPI** (`CryptProtectData`) at **per-user** scope (never `LOCAL_MACHINE`), optionally surfaced through Credential Manager; entropy parameter set per item. - **Linux β€” libsecret** (Secret Service API) with schema-scoped attributes in the default collection. **Known limitation:** the Secret Service does not isolate by application β€” any unlocked-session same-user process can read the collection. This is a platform constraint, not a defect we can close in the adapter; it is the primary reason the loopback token is **separate, per-session, and rotated** (so its blast radius is one session) and a reason Linux users who want stronger isolation should run a locked keyring. 3. **Threat β€” what a keychain read grants (confirmed, sobering):** a successful read of these accounts yields the **access-token JWT** (act as the user against the hosted gateway with the `scopesForRole` ceiling β€” read **and** write), the **refresh token** (mint new JWTs until reuse-detection family- revoke trips), and the **loopback token** (drive the local inference front-door). That is **data-plane account compromise**, not merely local-inference access. Consequences and bounds: - The JWT is **no worse than a stolen web-session JWT**, and strictly better on refresh (the server-side gate backs native refresh with `refresh-token-core` rotation + reuseβ†’family-revoke). - macOS `…ThisDeviceOnly` + ACL and Windows per-user DPAPI raise the bar to a same-user, same-device attacker; Linux is weaker (above). - The loopback token's separation means stealing it does **not** yield the JWT and vice-versa (Phase 3 custody Β§4: stored under its own account, rotated each start, independent of OAuth logout). 4. **No plaintext fallback, ever.** If the OS keychain is unavailable/locked, the adapter **fails closed** (the operation errors and the dependent flow aborts) β€” it **never** falls back to a dotfile, env var, or plaintext store. A locked keychain means "re-auth / retry," not "store insecurely." ### Fail-closed Unknown account β†’ reject. Keychain unavailable/locked β†’ error (no plaintext fallback). Corrupt/oversize stored value β†’ custody `loadSession` already returns `null` β†’ caller treats as `reauth`. --- ## 5. Decision D5.4 β€” Spawn adapter (process-management surface) > Question: the minimal spawn/kill/health-probe surface; what happens if the spawned process escapes > supervision? ### Verified state Phase 4 `RuntimeAdapterFns` types the surface as `spawn(opts) β†’ SpawnHandle`, `handle.kill()`, `healthCheck(handle) β†’ boolean`, `statResources()` (the last is D5.6). `SpawnOpts` carries `{ binaryPath, modelPath, port, maxRamBytes }` β€” **no** vault/JWT/keychain field by construction. Phase 4 Β§5.2 mandates spawn only after integrity passes, bind `127.0.0.1`, ephemeral port, and the Phase 2 guard in front. ### Decision β€” **`spawn` + `kill` + `healthCheck` only; hardened launch; supervised lifetime** 1. **Minimal surface.** `spawn`, `kill`, `healthCheck` β€” nothing else (resources = D5.6, download = D5.5). The handle exposes `pid` + `kill()` only. 2. **Hardened launch (every item is a control, not a style choice):** - **Absolute `binaryPath`** to the bundled runtime β€” **never** a `PATH` lookup (prevents `PATH`-injection of a malicious `ollama`). - **`shell: false`** + **argv array** β€” never a concatenated command string (no shell-injection from any spec field). - **Scrubbed environment** β€” the child receives a **minimal allowlist** of env vars (`HOME`/`TMPDIR`/locale as needed); it is **explicitly stripped** of `SESSION_SECRET`, any `*_API_KEY`, the JWT, refresh token, loopback token, and anything keychain-related. Env is the classic ambient-authority leak; this closes it (see D5.8). - **Loopback bind flag** passed to the runtime so it binds `127.0.0.1`/UDS only, plus the resource ceiling flags derived from `maxRamBytes`. - **`detached: false`**, child placed in the companion's **process group** so it cannot outlive an orphaning parent. 3. **The runtime back-end is not a second authority-bearing endpoint (resolves P-a/P-c at the runtime port):** the companion exposes the **guarded front-door** (D5.1); the runtime listens **behind** it. Preference order for the back-end channel: - **Preferred:** a **Unix-domain socket with mode `0600`** (owner-only) where the runtime supports it β€” this makes the back-end reachable only by the companion's user and removes the loopback-TCP bypass entirely. - **Fallback (TCP-only runtimes):** bind the runtime to `127.0.0.1` on its own ephemeral port, emit **no CORS**, and accept the **documented residual**: any same-user local process (and a web page via no-cors, write-only) can spend local inference compute on it. This is acceptable **only because** the back-end holds **no secret and no data path** β€” it serves inference and nothing else (D5.8), so a direct hit wastes compute but exfiltrates nothing. The front-door remains the sole *authenticated* path; the rate limiter lives there. 4. **Escape / supervision threat model (confirmed):** - **Orphan holding the port/compute after a companion crash:** on every start, the companion **probes the persisted port for a stale runtime and kills it before rebinding** (detect-and-reclaim). Because the loopback token is **rotated each start** (Phase 3 custody), a stale runtime from a prior session is **inert** β€” the new front-door's `expectedToken` no longer matches, and a stale back-end serves no one useful. On clean shutdown, kill the **process group**. - **Poisoned runtime (model-driven RCE):** the **only** path to spawn is a **verified** model (D5.5 + Phase 4 single-path-to-`ready`); the child runs with the **scrubbed env** (no secrets, no keychain, no JWT β€” D5.8) and **loopback/UDS bind only**, so even arbitrary code in the runtime process cannot read the keychain handle (it was never in scope), cannot read the JWT (not in env), and cannot reach the canister as the user (no token). Stronger OS sandboxing (seatbelt/AppContainer/ seccomp, entitlements) is **Phase 7**; D5.4 mandates the env-scrub + no-shell + argv + process-group + loopback-bind baseline now. 5. **Health probe** goes through the **same admission path** the runtime serves (the front-door, with the loopback token) so there is exactly **one** way to reach inference; the probe is `GET /v1/models` / `GET /api/tags`. A health success drives `transitionLifecycle(state,'health_ok')` (Phase 4). ### Fail-closed Spawn fails / binary missing / integrity not yet verified β†’ no spawn, lifecycle stays `stopped`, `companionAvailable` false. Health probe fails the retry budget β†’ `transitionLifecycle(state, 'health_fail')` β†’ kill the child β†’ `stopped`. --- ## 6. Decision D5.5 β€” Download adapter + Phase 4 integrity wiring > Question: how does the real TLS download feed bytes into `createIntegrityAccumulator`; where does > `finalize()` live; who decides the model spec (`allowedSourceUrls`, `expectedDigest`, > `expectedSizeBytes`)? ### Verified state Phase 4 `createIntegrityAccumulator({ expectedDigest, expectedSizeBytes, sourceUrl, allowedSourceUrls })` exposes `update(chunk)` / `finalize()` / `getReceivedBytes()` / `abort()`; `validateSourceUrl` enforces `https:`-only + allowlist; `validateIntegritySpec` enforces 64-char lowercase-hex digest + positive integer size; `finalize()` uses constant-time digest compare + exact size. Phase 4 Β§3 fixes the single path to `ready`: validate spec β†’ accumulate β†’ `finalize().ok` β†’ `start` β†’ spawn β†’ `health_ok`. `RuntimeAdapterFns.download(url, onChunk)` is the dumb byte pump. ### Decision β€” **Dumb download adapter; accumulator + `finalize()` owned by the orchestrator; trust anchor is a first-party manifest** 1. **Adapter is a dumb, verifying-transport byte pump.** `download(url, onChunk)`: - Performs an **HTTPS-only** GET with **full TLS verification** (`rejectUnauthorized` stays true; **no** insecure flag, ever). The `url` MUST have already passed `validateSourceUrl` against the spec's `allowedSourceUrls`. - **Streams**: calls `onChunk(chunk)` for **every** received byte, in order, no buffering of the whole file (multi-GB models must not be loaded into RAM). - **Makes no integrity decision.** It does not compute or compare the digest; it cannot "report success." This separation means a compromised download adapter cannot fake verification β€” the decision lives elsewhere. 2. **Where `finalize()` lives β€” the orchestrator, not the adapter.** The companion's **runtime-manager orchestration layer** (the code wiring Phase 4's pure core to the real adapters) owns the accumulator: - Before the download: `validateSourceUrl` + `validateIntegritySpec` (fail-closed), then `createIntegrityAccumulator({...spec})`. - During: `await adapter.download(spec.url, (chunk) => acc.update(chunk))`. - After the stream ends: `const verdict = acc.finalize()`. - **`verdict.ok === true` is the precondition** for `transitionLifecycle(state,'start')` and spawn. On `verdict.ok === false`: **delete the downloaded file**, log only the fixed reason code, and **refuse to spawn** (lifecycle stays `stopped`). This is exactly Phase 4 Β§3 / Β§5.1. 3. **Atomic, TOCTOU-aware file handling.** Download to a **temp path in a companion-private directory** (`0700`), `fsync`, verify via `finalize()`, then **atomically rename** into the verified model path. **Spawn only from the verified path.** To avoid a verify-then-swap (TOCTOU) where another same-user process replaces the file between verify and spawn, keep the verified file in the companion-private directory and prefer launching from a held descriptor / re-stat the inode identity at spawn; treat any mismatch as integrity failure. 4. **Who decides the spec β€” the trust anchor (most security-critical answer):** the model spec (`allowedSourceUrls`, `expectedDigest`, `expectedSizeBytes`) comes from a **first-party, signed model manifest** that is **independent of the model download channel**. Concretely, the `expectedDigest` must originate from a source the **download/CDN attacker does not control**: - **Preferred:** a manifest **fetched from the Knowtation hosted gateway over TLS** (a trusted first-party origin, distinct from the model CDN), and/or **baked into the signed companion** (Phase 7). The manifest fetch is performed by the **authority group** (it may carry the JWT); only the **plain resolved values** (`url`, `digest`, `size`) are then handed to the **runtime group** β€” the download adapter never sees the JWT (D5.8). - **Forbidden:** taking the digest/size from the **same host that serves the model bytes** (circular trust β€” an attacker who controls the CDN would control both bytes and "expected" digest), or from **user-supplied** input, or from an unauthenticated channel. - Manifest authenticity itself (signature scheme, key rotation) is the supply-chain detail; the binding decision here is: **the trust anchor is first-party and out-of-band from the model host, fail-closed if it cannot be authenticated.** ### Fail-closed No authenticated manifest β†’ no download. `validateSourceUrl`/`validateIntegritySpec` fail β†’ no download. Stream error mid-flight β†’ `acc.abort()` β†’ `finalize()` returns `accumulator_aborted` β†’ delete temp, no spawn. `finalize().ok === false` (size/digest mismatch) β†’ delete, no spawn. --- ## 7. Decision D5.6 β€” Resource-probe adapter > Question: what OS APIs; what privacy risk (does probing VRAM expose what other apps are doing); is > it acceptable? ### Verified state Phase 4 `evaluateResourceLimits(observation, limits)` consumes a `ResourceObservation` `{ ramBytes, vramBytes, cpuPercent }` and fails closed on malformed input. `RuntimeAdapterFns.statResources()` is the deferred probe. Phase 4 Β§4 suggests caching the observation ≀ 500 ms. ### Decision β€” **Probe the runtime's own PID; VRAM as aggregate headroom only; never enumerate other processes; no privilege escalation** 1. **OS APIs (scoped to the runtime PID):** - **RAM (RSS):** macOS `proc_pidinfo`/`task_info`; Linux `/proc//statm` or `smaps_rollup`; Windows `GetProcessMemoryInfo`. **Keyed on the runtime child's PID only** β€” never system-wide RAM. - **CPU%:** per-process CPU time deltas for the runtime PID (same per-OS sources), sampled over an interval. - **VRAM:** the privacy-sensitive one. There is no portable per-process VRAM accounting without vendor tooling (`nvidia-smi`, Metal/`IOKit` counters, DXGI). `nvidia-smi --query-compute-apps` reports per-process VRAM **across all GPU processes**, which would expose **other applications'** PIDs/names/footprints. 2. **Privacy risk + decision (confirmed):** probing VRAM **can** expose what other apps are doing on the GPU. Therefore Phase 5 **MUST**: - Read VRAM as **aggregate device headroom** (total/free) **or** the runtime PID's own usage β€” and **discard everything else immediately**. Other processes' PIDs, names, and usage are **never parsed into the observation, never logged, never persisted**. - **Never enumerate** the GPU process table for any purpose beyond extracting the runtime's own line / the aggregate scalar. - **No privilege escalation for telemetry.** If per-process VRAM requires elevated rights, **skip it** (treat `vramBytes = 0` / `maxVramBytes = Infinity`) rather than escalate. Telemetry must never be a reason to ask for more OS privilege than inference needs. 3. **Acceptable? Yes**, under (1)+(2): scoped to the runtime PID, aggregate-only VRAM, no enumeration, no escalation, nothing about other apps retained. This keeps resource enforcement (the OOM defense, Phase 4 threat **b**) while honoring the design gate's privacy posture. 4. **Cache** the observation ≀ 500 ms (Phase 4 Β§4) to bound syscall overhead. ### Fail-closed Probe fails / returns malformed β†’ `evaluateResourceLimits` returns `malformed_observation` β†’ the per-request gate **denies** inference (refuse-rather-than-run-blind into OOM). A self-inflicted denial is the safe outcome; the alternative (running blind) risks the user's whole machine. --- ## 8. Decision D5.7 β€” Phase 1 seam activation (`companionAvailable`) > Question: when exactly does `companionAvailable` flip to true? (Must be **after** integrity verified > **and** lifecycle `ready` **and** a health round-trip succeeds.) ### Verified state Phase 1 `lib/model-runtime-lane.mjs`: `selectLane` returns `'local'` when `inBrowserAvailable || companionAvailable`; `companionAvailable` defaults **false** (fail-closed) and is documented as "set by Phase 5 only when `canServeInference(lifecycle)` is true." Phase 4 Β§3 step 8 + Β§5.3 step 10 fix the ordering: set true only after `health_ok`, false on drain/stop. ### Decision β€” **True only when ALL of {integrity-verified ∧ lifecycle `ready` ∧ recent health round-trip} hold; false on any doubt** 1. **Flip to `true` only when every condition holds simultaneously:** - **Integrity verified:** the model's `acc.finalize().ok === true` for the running model (D5.5). - **Lifecycle `ready`:** `canServeInference(lifecycle) === true`, reached **only** via `stopped β†’ starting β†’ ready` on a successful `health_ok` (Phase 4 β€” there is no direct `stopped β†’ ready`). - **Real health round-trip:** at least one **end-to-end** probe **through the guarded front-door** (loopback token presented, admitted, runtime answered correctly) has succeeded β€” not merely "the process spawned." - Plus: the front-door listener is bound and the loopback token is stored (`rotateLoopbackToken` done), so an admitted caller can actually be served. 2. **Recency bound (anti-staleness, P-i):** `companionAvailable` must be backed by a **recent** successful health round-trip. Phase 5 re-probes on an interval; if the last success is older than the threshold (or a probe fails), treat the flag as **false** until re-confirmed. This stops `selectLane` from routing to a silently-dead runtime. 3. **Flip to `false` immediately on any of:** `drain`/`stop` (Phase 4 `transitionLifecycle`), `health_fail`, a resource-limit-triggered drain, detected runtime crash/exit, keychain/loopback-token loss, or companion shutdown. **Default and ambiguity β†’ false.** Never set true optimistically. 4. **Scope of the flag.** `companionAvailable` means "**local inference is reachable and ready on this device**." It does **not** assert auth/consent β€” write-back of enrichment still passes Phase 1's `enforceConsentPolicy` and needs a valid session (D5.3 custody `decide() !== 'reauth'`). Keeping the flag scoped to inference-readiness avoids conflating lane *capability* with lane *permission*. 5. **Mechanism.** Phase 5's binding layer computes the flag strictly from `canServeInference(lifecycle)` ∧ recent-health-ok and writes it into the **live `LaneCapabilities`** passed to `selectLane`. The value is never cached beyond the recency bound. ### Fail-closed Any missing condition, stale health, or ambiguity β†’ `companionAvailable = false` β†’ `selectLane` falls through the chain (in-browser β†’ self_hosted β†’ … β†’ `disabled`), exactly the D2.2 fallback. --- ## 9. Decision D5.8 β€” No-ambient-authority enforcement mechanism > Question: what mechanism prevents the spawn/download adapter from **ever** holding a vault handle, > JWT, or keychain read capability? ### Verified state Phase 4 already guarantees the **decision core** imports no vault/canister/keychain/auth module and that `RuntimeAdapterFns` carries no authority accessor (structural). Design gate Β§4.6 forbids ambient authority on the loopback endpoint. The gap Phase 5 closes: the **real adapters and the wiring layer** must preserve that separation when authority objects (keychain, JWT) finally exist in the same process. ### Decision β€” **Object-capability segregation, enforced by tests, not convention** 1. **Two disjoint capability groups, constructed in separate scopes:** - **Authority group** (holds secrets/handles): the **OS-keychain adapter** (D5.3), the **OAuth/ session controller** (JWT + refresh via `companion-token-custody` / `companion-oauth-pkce`), and the **canister/vault client**. Instantiated and held **only** by the session/auth controller. - **Runtime group** (no secrets): the `RuntimeAdapterFns` implementations β€” `spawn`, `download`, `healthCheck`, `statResources`. Constructed with **no reference** to any authority-group object. 2. **The runtime group receives only inert data:** a verified **file path**, a **port**, a validated **URL**, and **resource limits**. It is **never** passed the keychain adapter, the JWT, the refresh token, or a canister handle. The model **manifest fetch** (which may need the JWT) is done by the **authority group**, which hands the runtime group only the resolved `{ url, digest, size }` (D5.5) β€” so the download adapter sees a URL, never a token. 3. **Environment scrub is part of the capability boundary** (D5.4): the spawned child's env is a minimal allowlist with `SESSION_SECRET`, `*_API_KEY`, JWT, refresh/loopback tokens, and keychain references **removed** β€” closing the env-as-ambient-authority leak that an import-graph check alone would miss. 4. **Enforced, not merely intended** β€” Phase 5's 7-tier suite MUST include: - an **architecture/import test** asserting the runtime-adapter module and `companion-runtime-manager` import **none** of `{ companion-token-custody, companion-oauth-pkce, keychain backend, canister/ vault client }`; - a **child-env-scrub security test** asserting the spawned process environment contains **none** of the secret-bearing keys; - a **surface test** asserting `RuntimeAdapterFns` and `SpawnOpts`/`SpawnHandle` expose no authority accessor (Phase 4 already; re-assert against the real impl); - a **download-adapter test** asserting it receives only a URL + chunk sink, never a token. 5. **The Phase 2 guard remains the only admission path** and its verdict the only output β€” an admitted inference request reaches the runtime and nothing else; it cannot pivot to vault/JWT because those handles do not exist anywhere reachable from the runtime group (structural, now test-enforced). ### Fail-closed If the wiring cannot construct the runtime group without an authority reference (e.g., a refactor introduces a shared singleton), the architecture test **fails the build** β€” the merge is blocked, not shipped with a warning. --- ## 10. How Phase 5 discharges the prior phases' deferred obligations | Source obligation | Discharged by | | --- | --- | | Phase 2 Β§6 (1) loopback bind, (2) ephemeral port, (4) `allowedHosts` from bound port, (7) no permissive CORS | D5.1 | | Phase 2 Β§6 (3) CSPRNG per-session token to keychain | D5.3 + Phase 3 `rotateLoopbackToken` | | Phase 3 Β§6 (1) system browser, (2) loopback redirect bind + ephemeral port, (4) callback validation w/ `expectedIssuer`, (5) TLS token POST, (6) keychain custody, (7) refresh drive, (8) loopback-token rotate | D5.2 (browser, redirect bind, callback), D5.3 (keychain), and the orchestration calling Phase 3's pure descriptors | | Phase 4 Β§5.1 download + integrity, Β§5.2 spawn, Β§5.3 health loop, Β§5.4 per-request gate, Β§5.6 minimal logging, Β§5.7 no ambient authority | D5.5 (download/integrity), D5.4 (spawn/health), D5.6 (resource probe), D5.8 (no ambient authority) | | Phase 4 Β§3 step 8 / Β§8 G2 β€” Phase 1 seam activation | D5.7 | | Server-side OAuth gate (βœ… DONE) β€” native client at `/api/v1/auth/native`, `iss` emission, loopback variable-port, scope ceiling | Consumed by D5.2 (the companion is the native client; passes `expectedIssuer`, binds the loopback redirect) | **Remaining external dependency:** none for first-party run-from-source. The server-side gate (G1) is DONE; this document is G2 (Phase 4 Β§8). Phase 5 implementation may proceed against this contract. --- ## 11. 7-tier test obligations (Phase 5 bind/lifecycle layer) Aaron's Rule #0. The pure cores' suites (Phases 2–4: 102 + 100 + 35 + 219 cases) do **not** absolve the bind layer of its own tests. Before any merge to `main`, the Phase 5 shell ships all seven tiers: | Tier | Focus | | --- | --- | | **Unit** | Bind helpers (loopback-only assertion, ephemeral-port allocation, `allowedHosts` from bound port); keychain adapter per backend (get/set/delete on the four accounts; unknown-account reject; no-list surface); spawn-opts hardening (absolute path, `shell:false`, argv, env-scrub); download adapter HTTPS-only + chunk pump; resource probe PID-scoping; `companionAvailable` predicate. | | **Integration** | OAuth PKCE loopback round-trip against the native provider (browser-open stubbed) β†’ keychain custody; download β†’ `createIntegrityAccumulator` β†’ `finalize()` β†’ `start` β†’ spawn β†’ `health_ok` β†’ `companionAvailable=true`; guard-in-front-of-runtime request path; refresh rotation β†’ `updateAccessToken`; reuse β†’ `clearSession`. | | **End-to-end** | Sign in β†’ fetch manifest (first-party) β†’ download β†’ verify β†’ spawn β†’ enrich a note locally via the guarded front-door β†’ result handled per Β§5/D3 policy; failure branches: integrity fail (no spawn), health fail (`stopped`), keychain locked (`reauth`). | | **Stress** | Concurrent inference through the front-door at `maxInFlight`/`queueBound`; many auth attempts (redirect-listener bind/teardown churn); stale-runtime reclaim across forced restarts; large streamed download to the accumulator. | | **Data-integrity** | Provenance fields on derived artifacts (deferred write-back is Phase 6, but the inference result's metadata is asserted); `finalize()` rejects 1-bit corruption end-to-end; loopback token rotates each start (old token inert); no secret persisted outside the keychain. | | **Performance** | Front-door admission overhead bound; runtime cold-start; resource-probe ≀ 500 ms cache honored; no event-loop starvation under streamed download. | | **Security (centerpiece)** | Loopback-only bind (reject `0.0.0.0`/`::`/routable); ephemeral-port not used as a control; DNS-rebinding + cross-origin still 403 at the front-door; runtime back-end carries no authority and emits no CORS; keychain read surface minimal + device-local (no iCloud sync); **child-env contains no secret**; **architecture/import test: runtime group imports no authority module**; manifest trust-anchor is out-of-band from the model host; resource probe never enumerates other GPU processes; `companionAvailable` fail-closed; **no secret in any log/error/redirect/adapter interface**; global fail-closed posture. | --- ## 12. Constraints honored - **Decisions only β€” no companion shell code.** This document writes none; it fixes the contract the implementation must obey. - **Muse-canonical**, on `feat/companion-app`, paired with the Phase 1–4 code already there β€” **not** a docs-only PR to `main`. - **Fail-closed on every ambiguous design point** (bind, custody, spawn, download, probe, seam). - **Security first; no ambient authority; no secret in any log, error, or adapter interface.** - **No assumptions stated as fact** β€” every cross-reference is anchored to a verified file/section in Phases 1–4 and the server-side OAuth gate. - **Phase 5 lifts only the bounded I/O subset of the design gate's "DOES NOT approve" list (Β§0);** packaging/signing/notarization/auto-update remain Phase 7 and are not approved here. --- ## 13. Approval table | Decision | Recommendation | Owner approval | | --- | --- | --- | | **D5.1** β€” inference loopback bind: OS-assigned ephemeral, loopback-only, port-secrecy not a control, no `SO_REUSEPORT`, two-listener separation | **ACCEPT** | ☐ pending | | **D5.2** β€” OAuth redirect: separate, one-shot, ephemeral loopback listener; pass `expectedIssuer`; system browser only | **ACCEPT** | ☐ pending | | **D5.3** β€” keychain adapter: `get`/`set`/`delete` on four fixed accounts only; device-local (macOS ThisDeviceOnly / Windows per-user DPAPI / Linux libsecret w/ documented limit); no plaintext fallback | **ACCEPT** | ☐ pending | | **D5.4** β€” spawn adapter: `spawn`/`kill`/`healthCheck`; absolute path, `shell:false`, argv, env-scrub, process-group; runtime back-end via UDS `0600` (else loopback TCP, no authority); detect-and-reclaim stale runtime | **ACCEPT** | ☐ pending | | **D5.5** β€” download adapter: dumb HTTPS-only byte pump; accumulator + `finalize()` owned by orchestrator; trust anchor = first-party signed manifest, out-of-band from the model host; atomic tempβ†’verified, TOCTOU-aware | **ACCEPT** | ☐ pending | | **D5.6** β€” resource probe: runtime-PID-scoped; VRAM aggregate-only, never enumerate other processes; no privilege escalation; fail-closed deny on probe failure | **ACCEPT** | ☐ pending | | **D5.7** β€” `companionAvailable` true only when integrity-verified ∧ `ready` ∧ recent health round-trip; recency-bounded; false on any doubt | **ACCEPT** | ☐ pending | | **D5.8** β€” no ambient authority: object-capability segregation (authority vs runtime groups), env-scrub, enforced by architecture/import + env-scrub tests (build-blocking) | **ACCEPT** | ☐ pending | On owner approval of D5.1–D5.8, the **Phase 5 implementation** (companion shell, run-from-source) is unblocked β€” itself gated on the Β§11 7-tier test obligation before any merge to `main`. **Phase 7** (packaging, signing, notarization, auto-update integrity) remains a separate, later gate and is **not** approved by this document.