Companion App — Phase 5: Bind Gate (sockets · spawn · keychain · download)
Status: 🧠 Thinking design gate — RATIFICATION REQUESTED. This document makes design
decisions only. No companion shell code is written or approved to run by this document. It fixes
the contract under which the Phase 5 implementation may, for the first time, open a real socket,
spawn a real process, read the real OS keychain, and perform a real TLS download.
Branch: feat/companion-app (Muse-canonical; paired with the Phase 1–4 code already on this
branch — not a docs-only PR to main, per the owner's no-docs-only-PR-to-main policy).
Phase table ref: Gate §12, Phase 5 ("companion app shell, integrating phases 2–4"). The phase
table marks the implementation ⚡ Sonnet/auto; this bind-gate design is elevated to 🧠 Thinking
because it is the first phase that performs ambient-authority-bearing I/O, and a wrong seam here is an
ambient-authority vulnerability, a supply-chain compromise, or an OS-permission overreach. Mirrors
the 🔀 Hybrid pattern: design the seam with a thinking model, implement against the fixed contract
with Sonnet/auto.
Depends on (all accepted):
COMPANION-APP-DESIGN-AND-AUTHORIZATION-GATE.md
(§4 the eight loopback controls, §4.6 no ambient authority, §7 packaging, §12 phase table, the
"DOES NOT approve" list); Phase 0 (gate §13, D1–D3); Phase 1
(COMPANION-APP-PHASE-1-ADAPTER-SEAM.md,
lib/model-runtime-lane.mjs); Phase 2
(COMPANION-APP-PHASE-2-LOOPBACK-SECURITY.md,
lib/companion-loopback-guard.mjs); Phase 3
(COMPANION-APP-PHASE-3-OAUTH-PKCE.md,
lib/companion-oauth-pkce.mjs, lib/companion-token-custody.mjs); Phase 4
(COMPANION-APP-PHASE-4-RUNTIME-MANAGER.md,
lib/companion-runtime-manager.mjs); the server-side OAuth gate ✅ DONE
(COMPANION-APP-OAUTH-SERVERSIDE-GATE.md,
hub/gateway/native-oauth-provider.mjs).
Simple summary
Until now, every piece of the companion app has been built as pure rules with no hands — it can decide whether a request is safe, whether a sign-in is valid, whether a downloaded model is genuine, and whether the runtime may serve a request, but it cannot actually open a door, start a program, read your password vault, or download a file. This phase is where the rules grow hands.
That is the single most dangerous step in the whole project, so this document does not write any of that code. It writes the rulebook the hands must obey — and argues every rule against a specific attacker:
- The two doors (one for local AI requests, one for the brief sign-in reply) open on a random internal number, on your machine only, never to the outside network.
- The password vault (OS keychain) is touched through the narrowest possible opening: store one secret, read one secret, delete one secret — nothing that can list or dump everything.
- Starting the AI program is done in a locked-down way: exact program, no shell, a stripped-down environment so the AI program can never see your sign-in token or vault, and it dies when the companion dies.
- Downloading a model checks the file's fingerprint against a trusted record before the program is ever run on it, and the trusted record comes from somewhere the download server can't forge.
- Checking how much memory/GPU is in use is scoped to the AI program only, and must never spy on what your other apps are doing.
- The "AI is ready on this device" switch only flips on after the file is verified, the program is genuinely up, and a real round-trip check succeeds — and flips off the instant anything is wrong.
- Finally: the parts with hands (start a program, download a file) are physically built so they can never hold your sign-in token, your vault handle, or the keychain — they only get a file path, a port, and a URL.
Technical summary
Phases 1–4 and the server-side OAuth gate delivered every decision core and the one server-side
route (/api/v1/auth/native) the companion needs. Each pure module deferred its real I/O to "Phase 5
behind an explicit gate." This document is that gate. It ratifies eight decisions (D5.1–D5.8)
covering the four I/O-bearing seams that converge in the companion shell: socket bind (the Phase 2
inference listener and the Phase 3 OAuth loopback redirect listener), the OS-keychain adapter
(custody of the JWT, refresh token, and Phase 2 loopback token), the spawn adapter (the bundled
Ollama/llama.cpp runtime), the download adapter (TLS model fetch wired into Phase 4's
createIntegrityAccumulator), and the resource-probe adapter — plus the Phase 1 seam
activation rule (companionAvailable) and the no-ambient-authority enforcement mechanism.
Every decision defaults fail-closed: any ambiguity at bind, custody, spawn, download, or probe
denies rather than proceeds. No secret (loopback token, JWT, refresh token, authorization code,
code_verifier, SESSION_SECRET, model digest) appears in any log, error, adapter interface, or
redirect. The implementation that follows this gate runs from source on a first-party machine and
must ship its own 7-tier suite for the bind/lifecycle layer before any merge to main; packaging,
code-signing, notarization, and auto-update remain Phase 7 and are not approved here.
0. What this gate lifts — and what it deliberately does not
The design gate's "DOES NOT approve (no code)" list named, among others, "opening any new local HTTP listener / loopback model endpoint" and "shipping any companion binary … or bundled runtime." Phase 5 is the gate that lifts a bounded subset of that list — and only that subset.
| Item from the design gate's "DOES NOT approve" list | Phase 5 disposition |
|---|---|
| Opening a new local HTTP listener / loopback model endpoint | LIFTED for the two loopback listeners specified in D5.1/D5.2 (inference + OAuth redirect), bound 127.0.0.1/[::1] only, under the Phase 2 guard. |
| Running an integrated companion process that binds sockets, spawns the runtime, reads the keychain, downloads a model | LIFTED for first-party run-from-source on the owner's/developer's machine, under D5.1–D5.8. |
| Shipping a companion binary, tray installer, auto-updater, code-signing/notarization | NOT lifted — remains Phase 7 (distribution gate). Phase 5 runs from source; it does not produce or distribute a signed artifact. |
| New canister routes / new Hub REST endpoints / wire-protocol changes | NOT needed — the one required server route (/api/v1/auth/native) was delivered by the server-side OAuth gate (✅ DONE). Phase 5 adds none. |
| New derived-artifact storage paths / encryption scheme | NOT lifted — remains Phase 6. Phase 5 performs inference; enrichment write-back storage is Phase 6. |
| Any change to OAuth client registration or scopes | NOT lifted / already bounded — the server-side OAuth gate fixed the native client + scope ceiling (scopesForRole). Phase 5 only consumes it as a client; it changes nothing server-side. |
Net: Phase 5 means "the rules may now have hands, on a first-party machine, under the contract below." It does not mean "ship a product." Distribution is a separate, later gate.
1. Adversarial threat model (the bind surface)
Phases 2 and 3 modelled the request and the protocol. Phase 5 adds the physical I/O, so the threat model expands to the host itself. Each attacker is paired with the exact Phase 5 control that stops it, argued against the attacker — not pattern-matched.
| # | Attacker capability | Exact Phase 5 control | Decision |
|---|---|---|---|
| P-a | Malicious web page issues fetch() to the companion's loopback listener (and to the runtime's internal port). |
Phase 2 guard on the front-door (token + Host + Origin/Sec-Fetch-Site + no permissive CORS); the runtime's internal port emits no CORS and carries no authority (a no-cors POST can spend compute but can never read a response cross-origin or reach data). |
D5.1, D5.4 |
| P-b | DNS-rebinding to reach the loopback bind. | Loopback-only bind (127.0.0.1/[::1], never 0.0.0.0/::) + Phase 2 Host allowlist. Ephemeral port is not relied on as a control. |
D5.1 |
| P-c | Local same-user malware connects directly to the listener or the runtime's internal port, bypassing the front-door. | Per-session bearer token on the front-door (Phase 2); the runtime back-end is reachable only via a Unix-domain socket (mode 0600) where supported, else a loopback TCP port that carries inference only, no authority and no secret — so direct access spends local compute but exfiltrates nothing. Rate-limited. |
D5.1, D5.4, D5.8 |
| P-d | Local process races to bind the OAuth redirect port / inference port, or steals it after the companion exits. | OS-assigned ephemeral port chosen per use; redirect listener is one-shot (bound only for one auth attempt, closed after one callback); no SO_REUSEPORT; stale-runtime reclaim on start; loopback token rotated each start so a stale listener is inert against the new session. |
D5.1, D5.2, D5.4 |
| P-e | Keychain read by another same-user process (or a backup/cloud-sync exfil). | Narrowest adapter surface (get/set/delete on four fixed accounts — no enumerate/dump); macOS …ThisDeviceOnly accessibility (no iCloud sync); ACL bound to the app where the OS supports it; the loopback token is separate, per-session, rotated so its compromise is not the JWT's compromise. Linux Secret Service's weaker same-user isolation is documented as a known platform limitation. |
D5.3 |
| P-f | Supply-chain: poisoned model file from a compromised CDN/mirror or MITM (valid TLS, malicious bytes). | Phase 4 createIntegrityAccumulator: SHA-256 + exact size, constant-time digest compare, before spawn. The trust anchor (expectedDigest) comes from a first-party signed manifest, not the model host — so the entity that controls the download cannot also forge the digest. Atomic temp→verified move; spawn only from the verified path. |
D5.5 |
| P-g | Escaped/orphaned runtime keeps running, holding a port or compute, after the companion crashes; or a poisoned runtime attempts RCE→data exfil. | Spawn in the companion's process group (dies with it); stale-runtime detect-and-kill on start; scrubbed environment (no SESSION_SECRET/JWT/keychain in the child env); no shell, argv array, absolute binary path; the runtime is handed only a file path + port + resource flags — never a vault/JWT/keychain handle. |
D5.4, D5.8 |
| P-h | Resource-probe privacy leak: GPU telemetry exposes other apps' processes/VRAM. | Probe is scoped to the runtime's own PID; VRAM read as aggregate headroom only; other processes' PIDs/names/usage are never enumerated, logged, or persisted; no privilege escalation for telemetry (skip VRAM rather than escalate). | D5.6 |
| P-i | Premature lane selection: the local lane is chosen against a runtime that is downloading, starting, dead, or unverified. | companionAvailable flips true only after integrity verified and lifecycle ready and a real health round-trip succeeded, backed by a recency bound; flips false immediately on drain/stop/health-fail/crash. Fail-closed default false. |
D5.7 |
| P-j | Ambient-authority pivot: the spawn/download/probe path acquires a vault handle, the JWT, or keychain capability and exfiltrates. | Object-capability segregation: the authority group (keychain + OAuth/session + canister) and the runtime group (spawn/download/health/stat) are constructed in disjoint scopes with no shared reference; enforced by an architecture/import test + a child-env-scrub test. | D5.8 |
| P-k | Secret in logs/errors at any I/O boundary (the new, dangerous surface). | All adapters log fixed reason codes only; never the token, JWT, refresh token, code, code_verifier, digest, URL, or model path. Reuses Phases 2–4's "no secret in output" posture at the I/O layer. |
all |
2. Decision D5.1 — Inference loopback socket bind contract
Question: what port-selection strategy for the loopback inference listener, and what is the threat model for port fixation?
Verified state
Phase 2 §6 already specifies the bind obligations the guard cannot enforce by itself: listen(0, '127.0.0.1'), never 0.0.0.0; non-predictable ephemeral port; build allowedHosts from the actual
bound port; never emit permissive CORS. Phase 4 §5.2 repeats the loopback-bind + ephemeral-port
requirement for the runtime. The Phase 2 guard (verifyLoopbackRequest) is the admission control;
the bind is the deferred I/O.
Decision — OS-assigned ephemeral port, loopback-only, port secrecy is NOT a control
- Bind
listen(0, '127.0.0.1')— the OS assigns an ephemeral port. If IPv6 loopback is offered, bind[::1]explicitly; never bind0.0.0.0,::, or any routable interface. A bind that can only be satisfied on a non-loopback interface is a hard startup abort — never a fallback to a broader interface. - Port fixation threat model (confirmed): an ephemeral port raises the cost of blind probing,
but a local process can scan the ~16k-wide ephemeral range in milliseconds, and a web page can fire
no-cors requests across it. Therefore the ephemeral port is defense-in-depth only; the real
controls are the Phase 2 bearer token +
Hostallowlist +Origin/Sec-Fetch-Sitecheck + rate limit. The design must never treat the port as a secret or as an access control. (This is why a fixed well-known port is rejected: it gives an attacker a free, stable target and a fingerprint, for zero security benefit.) - Exclusive ownership: do not set
SO_REUSEPORT(it would let a second local process bind the same port and split/hijack traffic). The listener owns its port for the session. - Port persistence: the bound port may be recorded for the local session only, in a
user-private location (file mode
0600/ per-user keychain-adjacent store), never world-readable and never logged at info level. It is not a secret, but minimizing its broadcast is free defense-in-depth. - Two-listener separation (critical, see D5.4): the guarded front-door the companion exposes is distinct from the runtime's own internal listener. Both are loopback; the front-door enforces the Phase 2 guard, the back-end carries no authority. Direct access to the back-end must never bypass an authority boundary (it bypasses only the rate limiter and serves inference — acceptable because the back-end holds no secret and no data path; see D5.4/D5.8).
Fail-closed
Cannot bind loopback → abort startup (no listener, companionAvailable stays false). Port-store write
fails → continue (the port is not a secret) but never broaden the bind to compensate.
3. Decision D5.2 — OAuth redirect loopback listener bind
Question: same ephemeral strategy for the PKCE loopback redirect? Confirm it can be a different port from the inference socket.
Verified state
Phase 3 §6 requires the redirect listener bound listen(0, '127.0.0.1') (or [::1]), the
redirect_uri constructed from the actual bound port and re-validated by validateRedirectUri, and
the state discarded after one callback. The server-side gate (✅ DONE) confirmed the native provider
accepts any loopback ephemeral port at registration and exact-matches it within one flow (RFC 8252
§7.3 + RFC 6749 §4.1.3); native-oauth-provider.mjs isLoopbackUri enforces 127.0.0.1/[::1]
literals only and rejects localhost.
Decision — Separate, short-lived, one-shot ephemeral redirect listener
- Different listener, different port. The OAuth redirect listener is a distinct server from
the inference listener, bound independently via
listen(0, '127.0.0.1')(or[::1]). They are different listeners with different lifecycles, so they get different ports — confirmed and required. Confusing them (or reusing the inference port for the redirect) is forbidden. - One-shot lifecycle. The redirect listener is bound only for the duration of a single authorization attempt and is torn down immediately after it receives exactly one callback (or on timeout/abort). This minimizes the window in which a local process (P-d) can race the port. PKCE already makes an intercepted code useless (Phase 3 threat a), but a one-shot listener also shrinks the race surface.
- Strict request handling. Accept only
GETon the exact registered callback path; hand the query params tovalidateAuthorizationResponse({ params, expectedState, expectedIssuer }). The native provider now emitsiss(server-side gate C3), so Phase 5 MUST passexpectedIssuerand treat a mismatch as fatal (full mix-up defense, Phase 3 threat c). No bearer-token check here — this listener authenticates bystate+iss+ PKCE, not by the loopback token. - Per-attempt secrets in memory only.
code_verifier,state, andnoncelive in memory for the attempt;stateis single-use and discarded after one callback. Never written to disk, never logged. - System browser only. The authorization URL (from
buildAuthorizationUrl) is opened in the OS default browser; an embedded webview is forbidden (RFC 8252 §8.12, Phase 3 threat i).
Fail-closed
Redirect listener cannot bind loopback → abort the auth attempt (no browser launch). Callback fails
validateAuthorizationResponse (bad state/iss/error) → abort, generic message, never log the raw
callback. Timeout with no callback → tear down the listener and surface a generic failure.
4. Decision D5.3 — OS-keychain adapter surface
Question: the minimal keychain API the companion needs; per-OS backends; what does a keychain read grant?
Verified state
Phase 3 custody (companion-token-custody.mjs) is written against an injected
{ get, set, delete } adapter over four fixed accounts (KEYCHAIN_ACCOUNTS: accessToken,
refreshToken, sessionMeta, loopbackToken), with MAX_SECRET_LEN bounds and fail-closed loads.
The adapter calls are awaited (sync or Promise both work). Phase 5 supplies the real backend.
Decision — Exactly get/set/delete on four named accounts; nothing wider; device-local
- Minimal surface. The real adapter implements only
get(account),set(account, secret),delete(account). Nolist, noenumerate, nogetAll, no wildcard/prefix query, no "dump." A compromised or buggy adapter cannot discover what it does not already name. Unknown account names are rejected fail-closed (the adapter accepts only the fourKEYCHAIN_ACCOUNTSliterals); enforceMAX_SECRET_LEN; never log or return a secret in an error. - Per-OS backends (least-privilege, device-local):
- macOS — Keychain Services generic-password items, accessibility
kSecAttrAccessibleWhenUnlockedThisDeviceOnly(no iCloud Keychain sync — the JWT/refresh token must not leave the device), with the item ACL bound to the companion's signed code identity where available (tightened further by Phase 7 signing). - Windows — DPAPI (
CryptProtectData) at per-user scope (neverLOCAL_MACHINE), optionally surfaced through Credential Manager; entropy parameter set per item. - Linux — libsecret (Secret Service API) with schema-scoped attributes in the default collection. Known limitation: the Secret Service does not isolate by application — any unlocked-session same-user process can read the collection. This is a platform constraint, not a defect we can close in the adapter; it is the primary reason the loopback token is separate, per-session, and rotated (so its blast radius is one session) and a reason Linux users who want stronger isolation should run a locked keyring.
- macOS — Keychain Services generic-password items, accessibility
- Threat — what a keychain read grants (confirmed, sobering): a successful read of these accounts
yields the access-token JWT (act as the user against the hosted gateway with the
scopesForRoleceiling — read and write), the refresh token (mint new JWTs until reuse-detection family- revoke trips), and the loopback token (drive the local inference front-door). That is data-plane account compromise, not merely local-inference access. Consequences and bounds:- The JWT is no worse than a stolen web-session JWT, and strictly better on refresh (the
server-side gate backs native refresh with
refresh-token-corerotation + reuse→family-revoke). - macOS
…ThisDeviceOnly+ ACL and Windows per-user DPAPI raise the bar to a same-user, same-device attacker; Linux is weaker (above). - The loopback token's separation means stealing it does not yield the JWT and vice-versa (Phase 3 custody §4: stored under its own account, rotated each start, independent of OAuth logout).
- The JWT is no worse than a stolen web-session JWT, and strictly better on refresh (the
server-side gate backs native refresh with
- No plaintext fallback, ever. If the OS keychain is unavailable/locked, the adapter fails closed (the operation errors and the dependent flow aborts) — it never falls back to a dotfile, env var, or plaintext store. A locked keychain means "re-auth / retry," not "store insecurely."
Fail-closed
Unknown account → reject. Keychain unavailable/locked → error (no plaintext fallback). Corrupt/oversize
stored value → custody loadSession already returns null → caller treats as reauth.
5. Decision D5.4 — Spawn adapter (process-management surface)
Question: the minimal spawn/kill/health-probe surface; what happens if the spawned process escapes supervision?
Verified state
Phase 4 RuntimeAdapterFns types the surface as spawn(opts) → SpawnHandle, handle.kill(),
healthCheck(handle) → boolean, statResources() (the last is D5.6). SpawnOpts carries
{ binaryPath, modelPath, port, maxRamBytes } — no vault/JWT/keychain field by construction.
Phase 4 §5.2 mandates spawn only after integrity passes, bind 127.0.0.1, ephemeral port, and the
Phase 2 guard in front.
Decision — spawn + kill + healthCheck only; hardened launch; supervised lifetime
- Minimal surface.
spawn,kill,healthCheck— nothing else (resources = D5.6, download = D5.5). The handle exposespid+kill()only. - Hardened launch (every item is a control, not a style choice):
- Absolute
binaryPathto the bundled runtime — never aPATHlookup (preventsPATH-injection of a maliciousollama). shell: false+ argv array — never a concatenated command string (no shell-injection from any spec field).- Scrubbed environment — the child receives a minimal allowlist of env vars
(
HOME/TMPDIR/locale as needed); it is explicitly stripped ofSESSION_SECRET, any*_API_KEY, the JWT, refresh token, loopback token, and anything keychain-related. Env is the classic ambient-authority leak; this closes it (see D5.8). - Loopback bind flag passed to the runtime so it binds
127.0.0.1/UDS only, plus the resource ceiling flags derived frommaxRamBytes. detached: false, child placed in the companion's process group so it cannot outlive an orphaning parent.
- Absolute
- The runtime back-end is not a second authority-bearing endpoint (resolves P-a/P-c at the runtime
port): the companion exposes the guarded front-door (D5.1); the runtime listens behind it.
Preference order for the back-end channel:
- Preferred: a Unix-domain socket with mode
0600(owner-only) where the runtime supports it — this makes the back-end reachable only by the companion's user and removes the loopback-TCP bypass entirely. - Fallback (TCP-only runtimes): bind the runtime to
127.0.0.1on its own ephemeral port, emit no CORS, and accept the documented residual: any same-user local process (and a web page via no-cors, write-only) can spend local inference compute on it. This is acceptable only because the back-end holds no secret and no data path — it serves inference and nothing else (D5.8), so a direct hit wastes compute but exfiltrates nothing. The front-door remains the sole authenticated path; the rate limiter lives there.
- Preferred: a Unix-domain socket with mode
- Escape / supervision threat model (confirmed):
- Orphan holding the port/compute after a companion crash: on every start, the companion
probes the persisted port for a stale runtime and kills it before rebinding (detect-and-reclaim).
Because the loopback token is rotated each start (Phase 3 custody), a stale runtime from a prior
session is inert — the new front-door's
expectedTokenno longer matches, and a stale back-end serves no one useful. On clean shutdown, kill the process group. - Poisoned runtime (model-driven RCE): the only path to spawn is a verified model
(D5.5 + Phase 4 single-path-to-
ready); the child runs with the scrubbed env (no secrets, no keychain, no JWT — D5.8) and loopback/UDS bind only, so even arbitrary code in the runtime process cannot read the keychain handle (it was never in scope), cannot read the JWT (not in env), and cannot reach the canister as the user (no token). Stronger OS sandboxing (seatbelt/AppContainer/ seccomp, entitlements) is Phase 7; D5.4 mandates the env-scrub + no-shell + argv + process-group- loopback-bind baseline now.
- Orphan holding the port/compute after a companion crash: on every start, the companion
probes the persisted port for a stale runtime and kills it before rebinding (detect-and-reclaim).
Because the loopback token is rotated each start (Phase 3 custody), a stale runtime from a prior
session is inert — the new front-door's
- Health probe goes through the same admission path the runtime serves (the front-door, with
the loopback token) so there is exactly one way to reach inference; the probe is
GET /v1/models/GET /api/tags. A health success drivestransitionLifecycle(state,'health_ok')(Phase 4).
Fail-closed
Spawn fails / binary missing / integrity not yet verified → no spawn, lifecycle stays stopped,
companionAvailable false. Health probe fails the retry budget → transitionLifecycle(state, 'health_fail') → kill the child → stopped.
6. Decision D5.5 — Download adapter + Phase 4 integrity wiring
Question: how does the real TLS download feed bytes into
createIntegrityAccumulator; where doesfinalize()live; who decides the model spec (allowedSourceUrls,expectedDigest,expectedSizeBytes)?
Verified state
Phase 4 createIntegrityAccumulator({ expectedDigest, expectedSizeBytes, sourceUrl, allowedSourceUrls })
exposes update(chunk) / finalize() / getReceivedBytes() / abort(); validateSourceUrl
enforces https:-only + allowlist; validateIntegritySpec enforces 64-char lowercase-hex digest +
positive integer size; finalize() uses constant-time digest compare + exact size. Phase 4 §3 fixes
the single path to ready: validate spec → accumulate → finalize().ok → start → spawn →
health_ok. RuntimeAdapterFns.download(url, onChunk) is the dumb byte pump.
Decision — Dumb download adapter; accumulator + finalize() owned by the orchestrator; trust anchor is a first-party manifest
- Adapter is a dumb, verifying-transport byte pump.
download(url, onChunk):- Performs an HTTPS-only GET with full TLS verification (
rejectUnauthorizedstays true; no insecure flag, ever). TheurlMUST have already passedvalidateSourceUrlagainst the spec'sallowedSourceUrls. - Streams: calls
onChunk(chunk)for every received byte, in order, no buffering of the whole file (multi-GB models must not be loaded into RAM). - Makes no integrity decision. It does not compute or compare the digest; it cannot "report success." This separation means a compromised download adapter cannot fake verification — the decision lives elsewhere.
- Performs an HTTPS-only GET with full TLS verification (
- Where
finalize()lives — the orchestrator, not the adapter. The companion's runtime-manager orchestration layer (the code wiring Phase 4's pure core to the real adapters) owns the accumulator:- Before the download:
validateSourceUrl+validateIntegritySpec(fail-closed), thencreateIntegrityAccumulator({...spec}). - During:
await adapter.download(spec.url, (chunk) => acc.update(chunk)). - After the stream ends:
const verdict = acc.finalize(). verdict.ok === trueis the precondition fortransitionLifecycle(state,'start')and spawn. Onverdict.ok === false: delete the downloaded file, log only the fixed reason code, and refuse to spawn (lifecycle staysstopped). This is exactly Phase 4 §3 / §5.1.
- Before the download:
- Atomic, TOCTOU-aware file handling. Download to a temp path in a companion-private directory
(
0700),fsync, verify viafinalize(), then atomically rename into the verified model path. Spawn only from the verified path. To avoid a verify-then-swap (TOCTOU) where another same-user process replaces the file between verify and spawn, keep the verified file in the companion-private directory and prefer launching from a held descriptor / re-stat the inode identity at spawn; treat any mismatch as integrity failure. - Who decides the spec — the trust anchor (most security-critical answer): the model spec
(
allowedSourceUrls,expectedDigest,expectedSizeBytes) comes from a first-party, signed model manifest that is independent of the model download channel. Concretely, theexpectedDigestmust originate from a source the download/CDN attacker does not control:- Preferred: a manifest fetched from the Knowtation hosted gateway over TLS (a trusted
first-party origin, distinct from the model CDN), and/or baked into the signed companion
(Phase 7). The manifest fetch is performed by the authority group (it may carry the JWT); only
the plain resolved values (
url,digest,size) are then handed to the runtime group — the download adapter never sees the JWT (D5.8). - Forbidden: taking the digest/size from the same host that serves the model bytes (circular trust — an attacker who controls the CDN would control both bytes and "expected" digest), or from user-supplied input, or from an unauthenticated channel.
- Manifest authenticity itself (signature scheme, key rotation) is the supply-chain detail; the binding decision here is: the trust anchor is first-party and out-of-band from the model host, fail-closed if it cannot be authenticated.
- Preferred: a manifest fetched from the Knowtation hosted gateway over TLS (a trusted
first-party origin, distinct from the model CDN), and/or baked into the signed companion
(Phase 7). The manifest fetch is performed by the authority group (it may carry the JWT); only
the plain resolved values (
Fail-closed
No authenticated manifest → no download. validateSourceUrl/validateIntegritySpec fail → no
download. Stream error mid-flight → acc.abort() → finalize() returns accumulator_aborted →
delete temp, no spawn. finalize().ok === false (size/digest mismatch) → delete, no spawn.
7. Decision D5.6 — Resource-probe adapter
Question: what OS APIs; what privacy risk (does probing VRAM expose what other apps are doing); is it acceptable?
Verified state
Phase 4 evaluateResourceLimits(observation, limits) consumes a ResourceObservation
{ ramBytes, vramBytes, cpuPercent } and fails closed on malformed input. RuntimeAdapterFns.statResources()
is the deferred probe. Phase 4 §4 suggests caching the observation ≤ 500 ms.
Decision — Probe the runtime's own PID; VRAM as aggregate headroom only; never enumerate other processes; no privilege escalation
- OS APIs (scoped to the runtime PID):
- RAM (RSS): macOS
proc_pidinfo/task_info; Linux/proc/<pid>/statmorsmaps_rollup; WindowsGetProcessMemoryInfo. Keyed on the runtime child's PID only — never system-wide RAM. - CPU%: per-process CPU time deltas for the runtime PID (same per-OS sources), sampled over an interval.
- VRAM: the privacy-sensitive one. There is no portable per-process VRAM accounting without
vendor tooling (
nvidia-smi, Metal/IOKitcounters, DXGI).nvidia-smi --query-compute-appsreports per-process VRAM across all GPU processes, which would expose other applications' PIDs/names/footprints.
- RAM (RSS): macOS
- Privacy risk + decision (confirmed): probing VRAM can expose what other apps are doing on
the GPU. Therefore Phase 5 MUST:
- Read VRAM as aggregate device headroom (total/free) or the runtime PID's own usage — and discard everything else immediately. Other processes' PIDs, names, and usage are never parsed into the observation, never logged, never persisted.
- Never enumerate the GPU process table for any purpose beyond extracting the runtime's own line / the aggregate scalar.
- No privilege escalation for telemetry. If per-process VRAM requires elevated rights, skip
it (treat
vramBytes = 0/maxVramBytes = Infinity) rather than escalate. Telemetry must never be a reason to ask for more OS privilege than inference needs.
- Acceptable? Yes, under (1)+(2): scoped to the runtime PID, aggregate-only VRAM, no enumeration, no escalation, nothing about other apps retained. This keeps resource enforcement (the OOM defense, Phase 4 threat b) while honoring the design gate's privacy posture.
- Cache the observation ≤ 500 ms (Phase 4 §4) to bound syscall overhead.
Fail-closed
Probe fails / returns malformed → evaluateResourceLimits returns malformed_observation → the
per-request gate denies inference (refuse-rather-than-run-blind into OOM). A self-inflicted denial
is the safe outcome; the alternative (running blind) risks the user's whole machine.
8. Decision D5.7 — Phase 1 seam activation (companionAvailable)
Question: when exactly does
companionAvailableflip to true? (Must be after integrity verified and lifecyclereadyand a health round-trip succeeds.)
Verified state
Phase 1 lib/model-runtime-lane.mjs: selectLane returns 'local' when inBrowserAvailable || companionAvailable; companionAvailable defaults false (fail-closed) and is documented as "set
by Phase 5 only when canServeInference(lifecycle) is true." Phase 4 §3 step 8 + §5.3 step 10 fix the
ordering: set true only after health_ok, false on drain/stop.
Decision — True only when ALL of {integrity-verified ∧ lifecycle ready ∧ recent health round-trip} hold; false on any doubt
- Flip to
trueonly when every condition holds simultaneously:- Integrity verified: the model's
acc.finalize().ok === truefor the running model (D5.5). - Lifecycle
ready:canServeInference(lifecycle) === true, reached only viastopped → starting → readyon a successfulhealth_ok(Phase 4 — there is no directstopped → ready). - Real health round-trip: at least one end-to-end probe through the guarded front-door (loopback token presented, admitted, runtime answered correctly) has succeeded — not merely "the process spawned."
- Plus: the front-door listener is bound and the loopback token is stored (
rotateLoopbackTokendone), so an admitted caller can actually be served.
- Integrity verified: the model's
- Recency bound (anti-staleness, P-i):
companionAvailablemust be backed by a recent successful health round-trip. Phase 5 re-probes on an interval; if the last success is older than the threshold (or a probe fails), treat the flag as false until re-confirmed. This stopsselectLanefrom routing to a silently-dead runtime. - Flip to
falseimmediately on any of:drain/stop(Phase 4transitionLifecycle),health_fail, a resource-limit-triggered drain, detected runtime crash/exit, keychain/loopback-token loss, or companion shutdown. Default and ambiguity → false. Never set true optimistically. - Scope of the flag.
companionAvailablemeans "local inference is reachable and ready on this device." It does not assert auth/consent — write-back of enrichment still passes Phase 1'senforceConsentPolicyand needs a valid session (D5.3 custodydecide() !== 'reauth'). Keeping the flag scoped to inference-readiness avoids conflating lane capability with lane permission. - Mechanism. Phase 5's binding layer computes the flag strictly from
canServeInference(lifecycle)∧ recent-health-ok and writes it into the liveLaneCapabilitiespassed toselectLane. The value is never cached beyond the recency bound.
Fail-closed
Any missing condition, stale health, or ambiguity → companionAvailable = false → selectLane falls
through the chain (in-browser → self_hosted → … → disabled), exactly the D2.2 fallback.
9. Decision D5.8 — No-ambient-authority enforcement mechanism
Question: what mechanism prevents the spawn/download adapter from ever holding a vault handle, JWT, or keychain read capability?
Verified state
Phase 4 already guarantees the decision core imports no vault/canister/keychain/auth module and
that RuntimeAdapterFns carries no authority accessor (structural). Design gate §4.6 forbids ambient
authority on the loopback endpoint. The gap Phase 5 closes: the real adapters and the wiring layer
must preserve that separation when authority objects (keychain, JWT) finally exist in the same process.
Decision — Object-capability segregation, enforced by tests, not convention
- Two disjoint capability groups, constructed in separate scopes:
- Authority group (holds secrets/handles): the OS-keychain adapter (D5.3), the OAuth/
session controller (JWT + refresh via
companion-token-custody/companion-oauth-pkce), and the canister/vault client. Instantiated and held only by the session/auth controller. - Runtime group (no secrets): the
RuntimeAdapterFnsimplementations —spawn,download,healthCheck,statResources. Constructed with no reference to any authority-group object.
- Authority group (holds secrets/handles): the OS-keychain adapter (D5.3), the OAuth/
session controller (JWT + refresh via
- The runtime group receives only inert data: a verified file path, a port, a validated
URL, and resource limits. It is never passed the keychain adapter, the JWT, the refresh
token, or a canister handle. The model manifest fetch (which may need the JWT) is done by the
authority group, which hands the runtime group only the resolved
{ url, digest, size }(D5.5) — so the download adapter sees a URL, never a token. - Environment scrub is part of the capability boundary (D5.4): the spawned child's env is a
minimal allowlist with
SESSION_SECRET,*_API_KEY, JWT, refresh/loopback tokens, and keychain references removed — closing the env-as-ambient-authority leak that an import-graph check alone would miss. - Enforced, not merely intended — Phase 5's 7-tier suite MUST include:
- an architecture/import test asserting the runtime-adapter module and
companion-runtime-managerimport none of{ companion-token-custody, companion-oauth-pkce, keychain backend, canister/ vault client }; - a child-env-scrub security test asserting the spawned process environment contains none of the secret-bearing keys;
- a surface test asserting
RuntimeAdapterFnsandSpawnOpts/SpawnHandleexpose no authority accessor (Phase 4 already; re-assert against the real impl); - a download-adapter test asserting it receives only a URL + chunk sink, never a token.
- an architecture/import test asserting the runtime-adapter module and
- The Phase 2 guard remains the only admission path and its verdict the only output — an admitted inference request reaches the runtime and nothing else; it cannot pivot to vault/JWT because those handles do not exist anywhere reachable from the runtime group (structural, now test-enforced).
Fail-closed
If the wiring cannot construct the runtime group without an authority reference (e.g., a refactor introduces a shared singleton), the architecture test fails the build — the merge is blocked, not shipped with a warning.
10. How Phase 5 discharges the prior phases' deferred obligations
| Source obligation | Discharged by |
|---|---|
Phase 2 §6 (1) loopback bind, (2) ephemeral port, (4) allowedHosts from bound port, (7) no permissive CORS |
D5.1 |
| Phase 2 §6 (3) CSPRNG per-session token to keychain | D5.3 + Phase 3 rotateLoopbackToken |
Phase 3 §6 (1) system browser, (2) loopback redirect bind + ephemeral port, (4) callback validation w/ expectedIssuer, (5) TLS token POST, (6) keychain custody, (7) refresh drive, (8) loopback-token rotate |
D5.2 (browser, redirect bind, callback), D5.3 (keychain), and the orchestration calling Phase 3's pure descriptors |
| Phase 4 §5.1 download + integrity, §5.2 spawn, §5.3 health loop, §5.4 per-request gate, §5.6 minimal logging, §5.7 no ambient authority | D5.5 (download/integrity), D5.4 (spawn/health), D5.6 (resource probe), D5.8 (no ambient authority) |
| Phase 4 §3 step 8 / §8 G2 — Phase 1 seam activation | D5.7 |
Server-side OAuth gate (✅ DONE) — native client at /api/v1/auth/native, iss emission, loopback variable-port, scope ceiling |
Consumed by D5.2 (the companion is the native client; passes expectedIssuer, binds the loopback redirect) |
Remaining external dependency: none for first-party run-from-source. The server-side gate (G1) is DONE; this document is G2 (Phase 4 §8). Phase 5 implementation may proceed against this contract.
11. 7-tier test obligations (Phase 5 bind/lifecycle layer)
Aaron's Rule #0. The pure cores' suites (Phases 2–4: 102 + 100 + 35 + 219 cases) do not absolve
the bind layer of its own tests. Before any merge to main, the Phase 5 shell ships all seven tiers:
| Tier | Focus |
|---|---|
| Unit | Bind helpers (loopback-only assertion, ephemeral-port allocation, allowedHosts from bound port); keychain adapter per backend (get/set/delete on the four accounts; unknown-account reject; no-list surface); spawn-opts hardening (absolute path, shell:false, argv, env-scrub); download adapter HTTPS-only + chunk pump; resource probe PID-scoping; companionAvailable predicate. |
| Integration | OAuth PKCE loopback round-trip against the native provider (browser-open stubbed) → keychain custody; download → createIntegrityAccumulator → finalize() → start → spawn → health_ok → companionAvailable=true; guard-in-front-of-runtime request path; refresh rotation → updateAccessToken; reuse → clearSession. |
| End-to-end | Sign in → fetch manifest (first-party) → download → verify → spawn → enrich a note locally via the guarded front-door → result handled per §5/D3 policy; failure branches: integrity fail (no spawn), health fail (stopped), keychain locked (reauth). |
| Stress | Concurrent inference through the front-door at maxInFlight/queueBound; many auth attempts (redirect-listener bind/teardown churn); stale-runtime reclaim across forced restarts; large streamed download to the accumulator. |
| Data-integrity | Provenance fields on derived artifacts (deferred write-back is Phase 6, but the inference result's metadata is asserted); finalize() rejects 1-bit corruption end-to-end; loopback token rotates each start (old token inert); no secret persisted outside the keychain. |
| Performance | Front-door admission overhead bound; runtime cold-start; resource-probe ≤ 500 ms cache honored; no event-loop starvation under streamed download. |
| Security (centerpiece) | Loopback-only bind (reject 0.0.0.0/::/routable); ephemeral-port not used as a control; DNS-rebinding + cross-origin still 403 at the front-door; runtime back-end carries no authority and emits no CORS; keychain read surface minimal + device-local (no iCloud sync); child-env contains no secret; architecture/import test: runtime group imports no authority module; manifest trust-anchor is out-of-band from the model host; resource probe never enumerates other GPU processes; companionAvailable fail-closed; no secret in any log/error/redirect/adapter interface; global fail-closed posture. |
12. Constraints honored
- Decisions only — no companion shell code. This document writes none; it fixes the contract the implementation must obey.
- Muse-canonical, on
feat/companion-app, paired with the Phase 1–4 code already there — not a docs-only PR tomain. - Fail-closed on every ambiguous design point (bind, custody, spawn, download, probe, seam).
- Security first; no ambient authority; no secret in any log, error, or adapter interface.
- No assumptions stated as fact — every cross-reference is anchored to a verified file/section in Phases 1–4 and the server-side OAuth gate.
- Phase 5 lifts only the bounded I/O subset of the design gate's "DOES NOT approve" list (§0); packaging/signing/notarization/auto-update remain Phase 7 and are not approved here.
13. Approval table
| Decision | Recommendation | Owner approval |
|---|---|---|
D5.1 — inference loopback bind: OS-assigned ephemeral, loopback-only, port-secrecy not a control, no SO_REUSEPORT, two-listener separation |
ACCEPT | ☐ pending |
D5.2 — OAuth redirect: separate, one-shot, ephemeral loopback listener; pass expectedIssuer; system browser only |
ACCEPT | ☐ pending |
D5.3 — keychain adapter: get/set/delete on four fixed accounts only; device-local (macOS ThisDeviceOnly / Windows per-user DPAPI / Linux libsecret w/ documented limit); no plaintext fallback |
ACCEPT | ☐ pending |
D5.4 — spawn adapter: spawn/kill/healthCheck; absolute path, shell:false, argv, env-scrub, process-group; runtime back-end via UDS 0600 (else loopback TCP, no authority); detect-and-reclaim stale runtime |
ACCEPT | ☐ pending |
D5.5 — download adapter: dumb HTTPS-only byte pump; accumulator + finalize() owned by orchestrator; trust anchor = first-party signed manifest, out-of-band from the model host; atomic temp→verified, TOCTOU-aware |
ACCEPT | ☐ pending |
| D5.6 — resource probe: runtime-PID-scoped; VRAM aggregate-only, never enumerate other processes; no privilege escalation; fail-closed deny on probe failure | ACCEPT | ☐ pending |
D5.7 — companionAvailable true only when integrity-verified ∧ ready ∧ recent health round-trip; recency-bounded; false on any doubt |
ACCEPT | ☐ pending |
| D5.8 — no ambient authority: object-capability segregation (authority vs runtime groups), env-scrub, enforced by architecture/import + env-scrub tests (build-blocking) | ACCEPT | ☐ pending |
On owner approval of D5.1–D5.8, the Phase 5 implementation (companion shell, run-from-source) is
unblocked — itself gated on the §11 7-tier test obligation before any merge to main. Phase 7
(packaging, signing, notarization, auto-update integrity) remains a separate, later gate and is not
approved by this document.