COMPANION-APP-PHASE-5-BIND-GATE.md markdown
604 lines 46.8 KB
Raw
sha256:65ccb454656ea5acdea0a10e559b78bcde1eb6ff753ecc2911bc99d1c3d7cadd feat(calendar): enforce agent context tiers in retrieval AP… Human minor ⚠ breaking 1 day ago

Companion App — Phase 5: Bind Gate (sockets · spawn · keychain · download)

Status: 🧠 Thinking design gate — RATIFICATION REQUESTED. This document makes design decisions only. No companion shell code is written or approved to run by this document. It fixes the contract under which the Phase 5 implementation may, for the first time, open a real socket, spawn a real process, read the real OS keychain, and perform a real TLS download. Branch: feat/companion-app (Muse-canonical; paired with the Phase 1–4 code already on this branch — not a docs-only PR to main, per the owner's no-docs-only-PR-to-main policy). Phase table ref: Gate §12, Phase 5 ("companion app shell, integrating phases 2–4"). The phase table marks the implementation ⚡ Sonnet/auto; this bind-gate design is elevated to 🧠 Thinking because it is the first phase that performs ambient-authority-bearing I/O, and a wrong seam here is an ambient-authority vulnerability, a supply-chain compromise, or an OS-permission overreach. Mirrors the 🔀 Hybrid pattern: design the seam with a thinking model, implement against the fixed contract with Sonnet/auto. Depends on (all accepted): COMPANION-APP-DESIGN-AND-AUTHORIZATION-GATE.md (§4 the eight loopback controls, §4.6 no ambient authority, §7 packaging, §12 phase table, the "DOES NOT approve" list); Phase 0 (gate §13, D1–D3); Phase 1 (COMPANION-APP-PHASE-1-ADAPTER-SEAM.md, lib/model-runtime-lane.mjs); Phase 2 (COMPANION-APP-PHASE-2-LOOPBACK-SECURITY.md, lib/companion-loopback-guard.mjs); Phase 3 (COMPANION-APP-PHASE-3-OAUTH-PKCE.md, lib/companion-oauth-pkce.mjs, lib/companion-token-custody.mjs); Phase 4 (COMPANION-APP-PHASE-4-RUNTIME-MANAGER.md, lib/companion-runtime-manager.mjs); the server-side OAuth gate ✅ DONE (COMPANION-APP-OAUTH-SERVERSIDE-GATE.md, hub/gateway/native-oauth-provider.mjs).


Simple summary

Until now, every piece of the companion app has been built as pure rules with no hands — it can decide whether a request is safe, whether a sign-in is valid, whether a downloaded model is genuine, and whether the runtime may serve a request, but it cannot actually open a door, start a program, read your password vault, or download a file. This phase is where the rules grow hands.

That is the single most dangerous step in the whole project, so this document does not write any of that code. It writes the rulebook the hands must obey — and argues every rule against a specific attacker:

  • The two doors (one for local AI requests, one for the brief sign-in reply) open on a random internal number, on your machine only, never to the outside network.
  • The password vault (OS keychain) is touched through the narrowest possible opening: store one secret, read one secret, delete one secret — nothing that can list or dump everything.
  • Starting the AI program is done in a locked-down way: exact program, no shell, a stripped-down environment so the AI program can never see your sign-in token or vault, and it dies when the companion dies.
  • Downloading a model checks the file's fingerprint against a trusted record before the program is ever run on it, and the trusted record comes from somewhere the download server can't forge.
  • Checking how much memory/GPU is in use is scoped to the AI program only, and must never spy on what your other apps are doing.
  • The "AI is ready on this device" switch only flips on after the file is verified, the program is genuinely up, and a real round-trip check succeeds — and flips off the instant anything is wrong.
  • Finally: the parts with hands (start a program, download a file) are physically built so they can never hold your sign-in token, your vault handle, or the keychain — they only get a file path, a port, and a URL.

Technical summary

Phases 1–4 and the server-side OAuth gate delivered every decision core and the one server-side route (/api/v1/auth/native) the companion needs. Each pure module deferred its real I/O to "Phase 5 behind an explicit gate." This document is that gate. It ratifies eight decisions (D5.1–D5.8) covering the four I/O-bearing seams that converge in the companion shell: socket bind (the Phase 2 inference listener and the Phase 3 OAuth loopback redirect listener), the OS-keychain adapter (custody of the JWT, refresh token, and Phase 2 loopback token), the spawn adapter (the bundled Ollama/llama.cpp runtime), the download adapter (TLS model fetch wired into Phase 4's createIntegrityAccumulator), and the resource-probe adapter — plus the Phase 1 seam activation rule (companionAvailable) and the no-ambient-authority enforcement mechanism.

Every decision defaults fail-closed: any ambiguity at bind, custody, spawn, download, or probe denies rather than proceeds. No secret (loopback token, JWT, refresh token, authorization code, code_verifier, SESSION_SECRET, model digest) appears in any log, error, adapter interface, or redirect. The implementation that follows this gate runs from source on a first-party machine and must ship its own 7-tier suite for the bind/lifecycle layer before any merge to main; packaging, code-signing, notarization, and auto-update remain Phase 7 and are not approved here.


0. What this gate lifts — and what it deliberately does not

The design gate's "DOES NOT approve (no code)" list named, among others, "opening any new local HTTP listener / loopback model endpoint" and "shipping any companion binary … or bundled runtime." Phase 5 is the gate that lifts a bounded subset of that list — and only that subset.

Item from the design gate's "DOES NOT approve" list Phase 5 disposition
Opening a new local HTTP listener / loopback model endpoint LIFTED for the two loopback listeners specified in D5.1/D5.2 (inference + OAuth redirect), bound 127.0.0.1/[::1] only, under the Phase 2 guard.
Running an integrated companion process that binds sockets, spawns the runtime, reads the keychain, downloads a model LIFTED for first-party run-from-source on the owner's/developer's machine, under D5.1–D5.8.
Shipping a companion binary, tray installer, auto-updater, code-signing/notarization NOT lifted — remains Phase 7 (distribution gate). Phase 5 runs from source; it does not produce or distribute a signed artifact.
New canister routes / new Hub REST endpoints / wire-protocol changes NOT needed — the one required server route (/api/v1/auth/native) was delivered by the server-side OAuth gate (✅ DONE). Phase 5 adds none.
New derived-artifact storage paths / encryption scheme NOT lifted — remains Phase 6. Phase 5 performs inference; enrichment write-back storage is Phase 6.
Any change to OAuth client registration or scopes NOT lifted / already bounded — the server-side OAuth gate fixed the native client + scope ceiling (scopesForRole). Phase 5 only consumes it as a client; it changes nothing server-side.

Net: Phase 5 means "the rules may now have hands, on a first-party machine, under the contract below." It does not mean "ship a product." Distribution is a separate, later gate.


1. Adversarial threat model (the bind surface)

Phases 2 and 3 modelled the request and the protocol. Phase 5 adds the physical I/O, so the threat model expands to the host itself. Each attacker is paired with the exact Phase 5 control that stops it, argued against the attacker — not pattern-matched.

# Attacker capability Exact Phase 5 control Decision
P-a Malicious web page issues fetch() to the companion's loopback listener (and to the runtime's internal port). Phase 2 guard on the front-door (token + Host + Origin/Sec-Fetch-Site + no permissive CORS); the runtime's internal port emits no CORS and carries no authority (a no-cors POST can spend compute but can never read a response cross-origin or reach data). D5.1, D5.4
P-b DNS-rebinding to reach the loopback bind. Loopback-only bind (127.0.0.1/[::1], never 0.0.0.0/::) + Phase 2 Host allowlist. Ephemeral port is not relied on as a control. D5.1
P-c Local same-user malware connects directly to the listener or the runtime's internal port, bypassing the front-door. Per-session bearer token on the front-door (Phase 2); the runtime back-end is reachable only via a Unix-domain socket (mode 0600) where supported, else a loopback TCP port that carries inference only, no authority and no secret — so direct access spends local compute but exfiltrates nothing. Rate-limited. D5.1, D5.4, D5.8
P-d Local process races to bind the OAuth redirect port / inference port, or steals it after the companion exits. OS-assigned ephemeral port chosen per use; redirect listener is one-shot (bound only for one auth attempt, closed after one callback); no SO_REUSEPORT; stale-runtime reclaim on start; loopback token rotated each start so a stale listener is inert against the new session. D5.1, D5.2, D5.4
P-e Keychain read by another same-user process (or a backup/cloud-sync exfil). Narrowest adapter surface (get/set/delete on four fixed accounts — no enumerate/dump); macOS …ThisDeviceOnly accessibility (no iCloud sync); ACL bound to the app where the OS supports it; the loopback token is separate, per-session, rotated so its compromise is not the JWT's compromise. Linux Secret Service's weaker same-user isolation is documented as a known platform limitation. D5.3
P-f Supply-chain: poisoned model file from a compromised CDN/mirror or MITM (valid TLS, malicious bytes). Phase 4 createIntegrityAccumulator: SHA-256 + exact size, constant-time digest compare, before spawn. The trust anchor (expectedDigest) comes from a first-party signed manifest, not the model host — so the entity that controls the download cannot also forge the digest. Atomic temp→verified move; spawn only from the verified path. D5.5
P-g Escaped/orphaned runtime keeps running, holding a port or compute, after the companion crashes; or a poisoned runtime attempts RCE→data exfil. Spawn in the companion's process group (dies with it); stale-runtime detect-and-kill on start; scrubbed environment (no SESSION_SECRET/JWT/keychain in the child env); no shell, argv array, absolute binary path; the runtime is handed only a file path + port + resource flags — never a vault/JWT/keychain handle. D5.4, D5.8
P-h Resource-probe privacy leak: GPU telemetry exposes other apps' processes/VRAM. Probe is scoped to the runtime's own PID; VRAM read as aggregate headroom only; other processes' PIDs/names/usage are never enumerated, logged, or persisted; no privilege escalation for telemetry (skip VRAM rather than escalate). D5.6
P-i Premature lane selection: the local lane is chosen against a runtime that is downloading, starting, dead, or unverified. companionAvailable flips true only after integrity verified and lifecycle ready and a real health round-trip succeeded, backed by a recency bound; flips false immediately on drain/stop/health-fail/crash. Fail-closed default false. D5.7
P-j Ambient-authority pivot: the spawn/download/probe path acquires a vault handle, the JWT, or keychain capability and exfiltrates. Object-capability segregation: the authority group (keychain + OAuth/session + canister) and the runtime group (spawn/download/health/stat) are constructed in disjoint scopes with no shared reference; enforced by an architecture/import test + a child-env-scrub test. D5.8
P-k Secret in logs/errors at any I/O boundary (the new, dangerous surface). All adapters log fixed reason codes only; never the token, JWT, refresh token, code, code_verifier, digest, URL, or model path. Reuses Phases 2–4's "no secret in output" posture at the I/O layer. all

2. Decision D5.1 — Inference loopback socket bind contract

Question: what port-selection strategy for the loopback inference listener, and what is the threat model for port fixation?

Verified state

Phase 2 §6 already specifies the bind obligations the guard cannot enforce by itself: listen(0, '127.0.0.1'), never 0.0.0.0; non-predictable ephemeral port; build allowedHosts from the actual bound port; never emit permissive CORS. Phase 4 §5.2 repeats the loopback-bind + ephemeral-port requirement for the runtime. The Phase 2 guard (verifyLoopbackRequest) is the admission control; the bind is the deferred I/O.

Decision — OS-assigned ephemeral port, loopback-only, port secrecy is NOT a control

  1. Bind listen(0, '127.0.0.1') — the OS assigns an ephemeral port. If IPv6 loopback is offered, bind [::1] explicitly; never bind 0.0.0.0, ::, or any routable interface. A bind that can only be satisfied on a non-loopback interface is a hard startup abort — never a fallback to a broader interface.
  2. Port fixation threat model (confirmed): an ephemeral port raises the cost of blind probing, but a local process can scan the ~16k-wide ephemeral range in milliseconds, and a web page can fire no-cors requests across it. Therefore the ephemeral port is defense-in-depth only; the real controls are the Phase 2 bearer token + Host allowlist + Origin/Sec-Fetch-Site check + rate limit. The design must never treat the port as a secret or as an access control. (This is why a fixed well-known port is rejected: it gives an attacker a free, stable target and a fingerprint, for zero security benefit.)
  3. Exclusive ownership: do not set SO_REUSEPORT (it would let a second local process bind the same port and split/hijack traffic). The listener owns its port for the session.
  4. Port persistence: the bound port may be recorded for the local session only, in a user-private location (file mode 0600 / per-user keychain-adjacent store), never world-readable and never logged at info level. It is not a secret, but minimizing its broadcast is free defense-in-depth.
  5. Two-listener separation (critical, see D5.4): the guarded front-door the companion exposes is distinct from the runtime's own internal listener. Both are loopback; the front-door enforces the Phase 2 guard, the back-end carries no authority. Direct access to the back-end must never bypass an authority boundary (it bypasses only the rate limiter and serves inference — acceptable because the back-end holds no secret and no data path; see D5.4/D5.8).

Fail-closed

Cannot bind loopback → abort startup (no listener, companionAvailable stays false). Port-store write fails → continue (the port is not a secret) but never broaden the bind to compensate.


3. Decision D5.2 — OAuth redirect loopback listener bind

Question: same ephemeral strategy for the PKCE loopback redirect? Confirm it can be a different port from the inference socket.

Verified state

Phase 3 §6 requires the redirect listener bound listen(0, '127.0.0.1') (or [::1]), the redirect_uri constructed from the actual bound port and re-validated by validateRedirectUri, and the state discarded after one callback. The server-side gate (✅ DONE) confirmed the native provider accepts any loopback ephemeral port at registration and exact-matches it within one flow (RFC 8252 §7.3 + RFC 6749 §4.1.3); native-oauth-provider.mjs isLoopbackUri enforces 127.0.0.1/[::1] literals only and rejects localhost.

Decision — Separate, short-lived, one-shot ephemeral redirect listener

  1. Different listener, different port. The OAuth redirect listener is a distinct server from the inference listener, bound independently via listen(0, '127.0.0.1') (or [::1]). They are different listeners with different lifecycles, so they get different ports — confirmed and required. Confusing them (or reusing the inference port for the redirect) is forbidden.
  2. One-shot lifecycle. The redirect listener is bound only for the duration of a single authorization attempt and is torn down immediately after it receives exactly one callback (or on timeout/abort). This minimizes the window in which a local process (P-d) can race the port. PKCE already makes an intercepted code useless (Phase 3 threat a), but a one-shot listener also shrinks the race surface.
  3. Strict request handling. Accept only GET on the exact registered callback path; hand the query params to validateAuthorizationResponse({ params, expectedState, expectedIssuer }). The native provider now emits iss (server-side gate C3), so Phase 5 MUST pass expectedIssuer and treat a mismatch as fatal (full mix-up defense, Phase 3 threat c). No bearer-token check here — this listener authenticates by state + iss + PKCE, not by the loopback token.
  4. Per-attempt secrets in memory only. code_verifier, state, and nonce live in memory for the attempt; state is single-use and discarded after one callback. Never written to disk, never logged.
  5. System browser only. The authorization URL (from buildAuthorizationUrl) is opened in the OS default browser; an embedded webview is forbidden (RFC 8252 §8.12, Phase 3 threat i).

Fail-closed

Redirect listener cannot bind loopback → abort the auth attempt (no browser launch). Callback fails validateAuthorizationResponse (bad state/iss/error) → abort, generic message, never log the raw callback. Timeout with no callback → tear down the listener and surface a generic failure.


4. Decision D5.3 — OS-keychain adapter surface

Question: the minimal keychain API the companion needs; per-OS backends; what does a keychain read grant?

Verified state

Phase 3 custody (companion-token-custody.mjs) is written against an injected { get, set, delete } adapter over four fixed accounts (KEYCHAIN_ACCOUNTS: accessToken, refreshToken, sessionMeta, loopbackToken), with MAX_SECRET_LEN bounds and fail-closed loads. The adapter calls are awaited (sync or Promise both work). Phase 5 supplies the real backend.

Decision — Exactly get/set/delete on four named accounts; nothing wider; device-local

  1. Minimal surface. The real adapter implements only get(account), set(account, secret), delete(account). No list, no enumerate, no getAll, no wildcard/prefix query, no "dump." A compromised or buggy adapter cannot discover what it does not already name. Unknown account names are rejected fail-closed (the adapter accepts only the four KEYCHAIN_ACCOUNTS literals); enforce MAX_SECRET_LEN; never log or return a secret in an error.
  2. Per-OS backends (least-privilege, device-local):
    • macOS — Keychain Services generic-password items, accessibility kSecAttrAccessibleWhenUnlockedThisDeviceOnly (no iCloud Keychain sync — the JWT/refresh token must not leave the device), with the item ACL bound to the companion's signed code identity where available (tightened further by Phase 7 signing).
    • Windows — DPAPI (CryptProtectData) at per-user scope (never LOCAL_MACHINE), optionally surfaced through Credential Manager; entropy parameter set per item.
    • Linux — libsecret (Secret Service API) with schema-scoped attributes in the default collection. Known limitation: the Secret Service does not isolate by application — any unlocked-session same-user process can read the collection. This is a platform constraint, not a defect we can close in the adapter; it is the primary reason the loopback token is separate, per-session, and rotated (so its blast radius is one session) and a reason Linux users who want stronger isolation should run a locked keyring.
  3. Threat — what a keychain read grants (confirmed, sobering): a successful read of these accounts yields the access-token JWT (act as the user against the hosted gateway with the scopesForRole ceiling — read and write), the refresh token (mint new JWTs until reuse-detection family- revoke trips), and the loopback token (drive the local inference front-door). That is data-plane account compromise, not merely local-inference access. Consequences and bounds:
    • The JWT is no worse than a stolen web-session JWT, and strictly better on refresh (the server-side gate backs native refresh with refresh-token-core rotation + reuse→family-revoke).
    • macOS …ThisDeviceOnly + ACL and Windows per-user DPAPI raise the bar to a same-user, same-device attacker; Linux is weaker (above).
    • The loopback token's separation means stealing it does not yield the JWT and vice-versa (Phase 3 custody §4: stored under its own account, rotated each start, independent of OAuth logout).
  4. No plaintext fallback, ever. If the OS keychain is unavailable/locked, the adapter fails closed (the operation errors and the dependent flow aborts) — it never falls back to a dotfile, env var, or plaintext store. A locked keychain means "re-auth / retry," not "store insecurely."

Fail-closed

Unknown account → reject. Keychain unavailable/locked → error (no plaintext fallback). Corrupt/oversize stored value → custody loadSession already returns null → caller treats as reauth.


5. Decision D5.4 — Spawn adapter (process-management surface)

Question: the minimal spawn/kill/health-probe surface; what happens if the spawned process escapes supervision?

Verified state

Phase 4 RuntimeAdapterFns types the surface as spawn(opts) → SpawnHandle, handle.kill(), healthCheck(handle) → boolean, statResources() (the last is D5.6). SpawnOpts carries { binaryPath, modelPath, port, maxRamBytes }no vault/JWT/keychain field by construction. Phase 4 §5.2 mandates spawn only after integrity passes, bind 127.0.0.1, ephemeral port, and the Phase 2 guard in front.

Decision — spawn + kill + healthCheck only; hardened launch; supervised lifetime

  1. Minimal surface. spawn, kill, healthCheck — nothing else (resources = D5.6, download = D5.5). The handle exposes pid + kill() only.
  2. Hardened launch (every item is a control, not a style choice):
    • Absolute binaryPath to the bundled runtime — never a PATH lookup (prevents PATH-injection of a malicious ollama).
    • shell: false + argv array — never a concatenated command string (no shell-injection from any spec field).
    • Scrubbed environment — the child receives a minimal allowlist of env vars (HOME/TMPDIR/locale as needed); it is explicitly stripped of SESSION_SECRET, any *_API_KEY, the JWT, refresh token, loopback token, and anything keychain-related. Env is the classic ambient-authority leak; this closes it (see D5.8).
    • Loopback bind flag passed to the runtime so it binds 127.0.0.1/UDS only, plus the resource ceiling flags derived from maxRamBytes.
    • detached: false, child placed in the companion's process group so it cannot outlive an orphaning parent.
  3. The runtime back-end is not a second authority-bearing endpoint (resolves P-a/P-c at the runtime port): the companion exposes the guarded front-door (D5.1); the runtime listens behind it. Preference order for the back-end channel:
    • Preferred: a Unix-domain socket with mode 0600 (owner-only) where the runtime supports it — this makes the back-end reachable only by the companion's user and removes the loopback-TCP bypass entirely.
    • Fallback (TCP-only runtimes): bind the runtime to 127.0.0.1 on its own ephemeral port, emit no CORS, and accept the documented residual: any same-user local process (and a web page via no-cors, write-only) can spend local inference compute on it. This is acceptable only because the back-end holds no secret and no data path — it serves inference and nothing else (D5.8), so a direct hit wastes compute but exfiltrates nothing. The front-door remains the sole authenticated path; the rate limiter lives there.
  4. Escape / supervision threat model (confirmed):
    • Orphan holding the port/compute after a companion crash: on every start, the companion probes the persisted port for a stale runtime and kills it before rebinding (detect-and-reclaim). Because the loopback token is rotated each start (Phase 3 custody), a stale runtime from a prior session is inert — the new front-door's expectedToken no longer matches, and a stale back-end serves no one useful. On clean shutdown, kill the process group.
    • Poisoned runtime (model-driven RCE): the only path to spawn is a verified model (D5.5 + Phase 4 single-path-to-ready); the child runs with the scrubbed env (no secrets, no keychain, no JWT — D5.8) and loopback/UDS bind only, so even arbitrary code in the runtime process cannot read the keychain handle (it was never in scope), cannot read the JWT (not in env), and cannot reach the canister as the user (no token). Stronger OS sandboxing (seatbelt/AppContainer/ seccomp, entitlements) is Phase 7; D5.4 mandates the env-scrub + no-shell + argv + process-group
      • loopback-bind baseline now.
  5. Health probe goes through the same admission path the runtime serves (the front-door, with the loopback token) so there is exactly one way to reach inference; the probe is GET /v1/models / GET /api/tags. A health success drives transitionLifecycle(state,'health_ok') (Phase 4).

Fail-closed

Spawn fails / binary missing / integrity not yet verified → no spawn, lifecycle stays stopped, companionAvailable false. Health probe fails the retry budget → transitionLifecycle(state, 'health_fail') → kill the child → stopped.


6. Decision D5.5 — Download adapter + Phase 4 integrity wiring

Question: how does the real TLS download feed bytes into createIntegrityAccumulator; where does finalize() live; who decides the model spec (allowedSourceUrls, expectedDigest, expectedSizeBytes)?

Verified state

Phase 4 createIntegrityAccumulator({ expectedDigest, expectedSizeBytes, sourceUrl, allowedSourceUrls }) exposes update(chunk) / finalize() / getReceivedBytes() / abort(); validateSourceUrl enforces https:-only + allowlist; validateIntegritySpec enforces 64-char lowercase-hex digest + positive integer size; finalize() uses constant-time digest compare + exact size. Phase 4 §3 fixes the single path to ready: validate spec → accumulate → finalize().okstart → spawn → health_ok. RuntimeAdapterFns.download(url, onChunk) is the dumb byte pump.

Decision — Dumb download adapter; accumulator + finalize() owned by the orchestrator; trust anchor is a first-party manifest

  1. Adapter is a dumb, verifying-transport byte pump. download(url, onChunk):
    • Performs an HTTPS-only GET with full TLS verification (rejectUnauthorized stays true; no insecure flag, ever). The url MUST have already passed validateSourceUrl against the spec's allowedSourceUrls.
    • Streams: calls onChunk(chunk) for every received byte, in order, no buffering of the whole file (multi-GB models must not be loaded into RAM).
    • Makes no integrity decision. It does not compute or compare the digest; it cannot "report success." This separation means a compromised download adapter cannot fake verification — the decision lives elsewhere.
  2. Where finalize() lives — the orchestrator, not the adapter. The companion's runtime-manager orchestration layer (the code wiring Phase 4's pure core to the real adapters) owns the accumulator:
    • Before the download: validateSourceUrl + validateIntegritySpec (fail-closed), then createIntegrityAccumulator({...spec}).
    • During: await adapter.download(spec.url, (chunk) => acc.update(chunk)).
    • After the stream ends: const verdict = acc.finalize().
    • verdict.ok === true is the precondition for transitionLifecycle(state,'start') and spawn. On verdict.ok === false: delete the downloaded file, log only the fixed reason code, and refuse to spawn (lifecycle stays stopped). This is exactly Phase 4 §3 / §5.1.
  3. Atomic, TOCTOU-aware file handling. Download to a temp path in a companion-private directory (0700), fsync, verify via finalize(), then atomically rename into the verified model path. Spawn only from the verified path. To avoid a verify-then-swap (TOCTOU) where another same-user process replaces the file between verify and spawn, keep the verified file in the companion-private directory and prefer launching from a held descriptor / re-stat the inode identity at spawn; treat any mismatch as integrity failure.
  4. Who decides the spec — the trust anchor (most security-critical answer): the model spec (allowedSourceUrls, expectedDigest, expectedSizeBytes) comes from a first-party, signed model manifest that is independent of the model download channel. Concretely, the expectedDigest must originate from a source the download/CDN attacker does not control:
    • Preferred: a manifest fetched from the Knowtation hosted gateway over TLS (a trusted first-party origin, distinct from the model CDN), and/or baked into the signed companion (Phase 7). The manifest fetch is performed by the authority group (it may carry the JWT); only the plain resolved values (url, digest, size) are then handed to the runtime group — the download adapter never sees the JWT (D5.8).
    • Forbidden: taking the digest/size from the same host that serves the model bytes (circular trust — an attacker who controls the CDN would control both bytes and "expected" digest), or from user-supplied input, or from an unauthenticated channel.
    • Manifest authenticity itself (signature scheme, key rotation) is the supply-chain detail; the binding decision here is: the trust anchor is first-party and out-of-band from the model host, fail-closed if it cannot be authenticated.

Fail-closed

No authenticated manifest → no download. validateSourceUrl/validateIntegritySpec fail → no download. Stream error mid-flight → acc.abort()finalize() returns accumulator_aborted → delete temp, no spawn. finalize().ok === false (size/digest mismatch) → delete, no spawn.


7. Decision D5.6 — Resource-probe adapter

Question: what OS APIs; what privacy risk (does probing VRAM expose what other apps are doing); is it acceptable?

Verified state

Phase 4 evaluateResourceLimits(observation, limits) consumes a ResourceObservation { ramBytes, vramBytes, cpuPercent } and fails closed on malformed input. RuntimeAdapterFns.statResources() is the deferred probe. Phase 4 §4 suggests caching the observation ≤ 500 ms.

Decision — Probe the runtime's own PID; VRAM as aggregate headroom only; never enumerate other processes; no privilege escalation

  1. OS APIs (scoped to the runtime PID):
    • RAM (RSS): macOS proc_pidinfo/task_info; Linux /proc/<pid>/statm or smaps_rollup; Windows GetProcessMemoryInfo. Keyed on the runtime child's PID only — never system-wide RAM.
    • CPU%: per-process CPU time deltas for the runtime PID (same per-OS sources), sampled over an interval.
    • VRAM: the privacy-sensitive one. There is no portable per-process VRAM accounting without vendor tooling (nvidia-smi, Metal/IOKit counters, DXGI). nvidia-smi --query-compute-apps reports per-process VRAM across all GPU processes, which would expose other applications' PIDs/names/footprints.
  2. Privacy risk + decision (confirmed): probing VRAM can expose what other apps are doing on the GPU. Therefore Phase 5 MUST:
    • Read VRAM as aggregate device headroom (total/free) or the runtime PID's own usage — and discard everything else immediately. Other processes' PIDs, names, and usage are never parsed into the observation, never logged, never persisted.
    • Never enumerate the GPU process table for any purpose beyond extracting the runtime's own line / the aggregate scalar.
    • No privilege escalation for telemetry. If per-process VRAM requires elevated rights, skip it (treat vramBytes = 0 / maxVramBytes = Infinity) rather than escalate. Telemetry must never be a reason to ask for more OS privilege than inference needs.
  3. Acceptable? Yes, under (1)+(2): scoped to the runtime PID, aggregate-only VRAM, no enumeration, no escalation, nothing about other apps retained. This keeps resource enforcement (the OOM defense, Phase 4 threat b) while honoring the design gate's privacy posture.
  4. Cache the observation ≤ 500 ms (Phase 4 §4) to bound syscall overhead.

Fail-closed

Probe fails / returns malformed → evaluateResourceLimits returns malformed_observation → the per-request gate denies inference (refuse-rather-than-run-blind into OOM). A self-inflicted denial is the safe outcome; the alternative (running blind) risks the user's whole machine.


8. Decision D5.7 — Phase 1 seam activation (companionAvailable)

Question: when exactly does companionAvailable flip to true? (Must be after integrity verified and lifecycle ready and a health round-trip succeeds.)

Verified state

Phase 1 lib/model-runtime-lane.mjs: selectLane returns 'local' when inBrowserAvailable || companionAvailable; companionAvailable defaults false (fail-closed) and is documented as "set by Phase 5 only when canServeInference(lifecycle) is true." Phase 4 §3 step 8 + §5.3 step 10 fix the ordering: set true only after health_ok, false on drain/stop.

Decision — True only when ALL of {integrity-verified ∧ lifecycle ready ∧ recent health round-trip} hold; false on any doubt

  1. Flip to true only when every condition holds simultaneously:
    • Integrity verified: the model's acc.finalize().ok === true for the running model (D5.5).
    • Lifecycle ready: canServeInference(lifecycle) === true, reached only via stopped → starting → ready on a successful health_ok (Phase 4 — there is no direct stopped → ready).
    • Real health round-trip: at least one end-to-end probe through the guarded front-door (loopback token presented, admitted, runtime answered correctly) has succeeded — not merely "the process spawned."
    • Plus: the front-door listener is bound and the loopback token is stored (rotateLoopbackToken done), so an admitted caller can actually be served.
  2. Recency bound (anti-staleness, P-i): companionAvailable must be backed by a recent successful health round-trip. Phase 5 re-probes on an interval; if the last success is older than the threshold (or a probe fails), treat the flag as false until re-confirmed. This stops selectLane from routing to a silently-dead runtime.
  3. Flip to false immediately on any of: drain/stop (Phase 4 transitionLifecycle), health_fail, a resource-limit-triggered drain, detected runtime crash/exit, keychain/loopback-token loss, or companion shutdown. Default and ambiguity → false. Never set true optimistically.
  4. Scope of the flag. companionAvailable means "local inference is reachable and ready on this device." It does not assert auth/consent — write-back of enrichment still passes Phase 1's enforceConsentPolicy and needs a valid session (D5.3 custody decide() !== 'reauth'). Keeping the flag scoped to inference-readiness avoids conflating lane capability with lane permission.
  5. Mechanism. Phase 5's binding layer computes the flag strictly from canServeInference(lifecycle) ∧ recent-health-ok and writes it into the live LaneCapabilities passed to selectLane. The value is never cached beyond the recency bound.

Fail-closed

Any missing condition, stale health, or ambiguity → companionAvailable = falseselectLane falls through the chain (in-browser → self_hosted → … → disabled), exactly the D2.2 fallback.


9. Decision D5.8 — No-ambient-authority enforcement mechanism

Question: what mechanism prevents the spawn/download adapter from ever holding a vault handle, JWT, or keychain read capability?

Verified state

Phase 4 already guarantees the decision core imports no vault/canister/keychain/auth module and that RuntimeAdapterFns carries no authority accessor (structural). Design gate §4.6 forbids ambient authority on the loopback endpoint. The gap Phase 5 closes: the real adapters and the wiring layer must preserve that separation when authority objects (keychain, JWT) finally exist in the same process.

Decision — Object-capability segregation, enforced by tests, not convention

  1. Two disjoint capability groups, constructed in separate scopes:
    • Authority group (holds secrets/handles): the OS-keychain adapter (D5.3), the OAuth/ session controller (JWT + refresh via companion-token-custody / companion-oauth-pkce), and the canister/vault client. Instantiated and held only by the session/auth controller.
    • Runtime group (no secrets): the RuntimeAdapterFns implementations — spawn, download, healthCheck, statResources. Constructed with no reference to any authority-group object.
  2. The runtime group receives only inert data: a verified file path, a port, a validated URL, and resource limits. It is never passed the keychain adapter, the JWT, the refresh token, or a canister handle. The model manifest fetch (which may need the JWT) is done by the authority group, which hands the runtime group only the resolved { url, digest, size } (D5.5) — so the download adapter sees a URL, never a token.
  3. Environment scrub is part of the capability boundary (D5.4): the spawned child's env is a minimal allowlist with SESSION_SECRET, *_API_KEY, JWT, refresh/loopback tokens, and keychain references removed — closing the env-as-ambient-authority leak that an import-graph check alone would miss.
  4. Enforced, not merely intended — Phase 5's 7-tier suite MUST include:
    • an architecture/import test asserting the runtime-adapter module and companion-runtime-manager import none of { companion-token-custody, companion-oauth-pkce, keychain backend, canister/ vault client };
    • a child-env-scrub security test asserting the spawned process environment contains none of the secret-bearing keys;
    • a surface test asserting RuntimeAdapterFns and SpawnOpts/SpawnHandle expose no authority accessor (Phase 4 already; re-assert against the real impl);
    • a download-adapter test asserting it receives only a URL + chunk sink, never a token.
  5. The Phase 2 guard remains the only admission path and its verdict the only output — an admitted inference request reaches the runtime and nothing else; it cannot pivot to vault/JWT because those handles do not exist anywhere reachable from the runtime group (structural, now test-enforced).

Fail-closed

If the wiring cannot construct the runtime group without an authority reference (e.g., a refactor introduces a shared singleton), the architecture test fails the build — the merge is blocked, not shipped with a warning.


10. How Phase 5 discharges the prior phases' deferred obligations

Source obligation Discharged by
Phase 2 §6 (1) loopback bind, (2) ephemeral port, (4) allowedHosts from bound port, (7) no permissive CORS D5.1
Phase 2 §6 (3) CSPRNG per-session token to keychain D5.3 + Phase 3 rotateLoopbackToken
Phase 3 §6 (1) system browser, (2) loopback redirect bind + ephemeral port, (4) callback validation w/ expectedIssuer, (5) TLS token POST, (6) keychain custody, (7) refresh drive, (8) loopback-token rotate D5.2 (browser, redirect bind, callback), D5.3 (keychain), and the orchestration calling Phase 3's pure descriptors
Phase 4 §5.1 download + integrity, §5.2 spawn, §5.3 health loop, §5.4 per-request gate, §5.6 minimal logging, §5.7 no ambient authority D5.5 (download/integrity), D5.4 (spawn/health), D5.6 (resource probe), D5.8 (no ambient authority)
Phase 4 §3 step 8 / §8 G2 — Phase 1 seam activation D5.7
Server-side OAuth gate (✅ DONE) — native client at /api/v1/auth/native, iss emission, loopback variable-port, scope ceiling Consumed by D5.2 (the companion is the native client; passes expectedIssuer, binds the loopback redirect)

Remaining external dependency: none for first-party run-from-source. The server-side gate (G1) is DONE; this document is G2 (Phase 4 §8). Phase 5 implementation may proceed against this contract.


11. 7-tier test obligations (Phase 5 bind/lifecycle layer)

Aaron's Rule #0. The pure cores' suites (Phases 2–4: 102 + 100 + 35 + 219 cases) do not absolve the bind layer of its own tests. Before any merge to main, the Phase 5 shell ships all seven tiers:

Tier Focus
Unit Bind helpers (loopback-only assertion, ephemeral-port allocation, allowedHosts from bound port); keychain adapter per backend (get/set/delete on the four accounts; unknown-account reject; no-list surface); spawn-opts hardening (absolute path, shell:false, argv, env-scrub); download adapter HTTPS-only + chunk pump; resource probe PID-scoping; companionAvailable predicate.
Integration OAuth PKCE loopback round-trip against the native provider (browser-open stubbed) → keychain custody; download → createIntegrityAccumulatorfinalize()start → spawn → health_okcompanionAvailable=true; guard-in-front-of-runtime request path; refresh rotation → updateAccessToken; reuse → clearSession.
End-to-end Sign in → fetch manifest (first-party) → download → verify → spawn → enrich a note locally via the guarded front-door → result handled per §5/D3 policy; failure branches: integrity fail (no spawn), health fail (stopped), keychain locked (reauth).
Stress Concurrent inference through the front-door at maxInFlight/queueBound; many auth attempts (redirect-listener bind/teardown churn); stale-runtime reclaim across forced restarts; large streamed download to the accumulator.
Data-integrity Provenance fields on derived artifacts (deferred write-back is Phase 6, but the inference result's metadata is asserted); finalize() rejects 1-bit corruption end-to-end; loopback token rotates each start (old token inert); no secret persisted outside the keychain.
Performance Front-door admission overhead bound; runtime cold-start; resource-probe ≤ 500 ms cache honored; no event-loop starvation under streamed download.
Security (centerpiece) Loopback-only bind (reject 0.0.0.0/::/routable); ephemeral-port not used as a control; DNS-rebinding + cross-origin still 403 at the front-door; runtime back-end carries no authority and emits no CORS; keychain read surface minimal + device-local (no iCloud sync); child-env contains no secret; architecture/import test: runtime group imports no authority module; manifest trust-anchor is out-of-band from the model host; resource probe never enumerates other GPU processes; companionAvailable fail-closed; no secret in any log/error/redirect/adapter interface; global fail-closed posture.

12. Constraints honored

  • Decisions only — no companion shell code. This document writes none; it fixes the contract the implementation must obey.
  • Muse-canonical, on feat/companion-app, paired with the Phase 1–4 code already there — not a docs-only PR to main.
  • Fail-closed on every ambiguous design point (bind, custody, spawn, download, probe, seam).
  • Security first; no ambient authority; no secret in any log, error, or adapter interface.
  • No assumptions stated as fact — every cross-reference is anchored to a verified file/section in Phases 1–4 and the server-side OAuth gate.
  • Phase 5 lifts only the bounded I/O subset of the design gate's "DOES NOT approve" list (§0); packaging/signing/notarization/auto-update remain Phase 7 and are not approved here.

13. Approval table

Decision Recommendation Owner approval
D5.1 — inference loopback bind: OS-assigned ephemeral, loopback-only, port-secrecy not a control, no SO_REUSEPORT, two-listener separation ACCEPT ☐ pending
D5.2 — OAuth redirect: separate, one-shot, ephemeral loopback listener; pass expectedIssuer; system browser only ACCEPT ☐ pending
D5.3 — keychain adapter: get/set/delete on four fixed accounts only; device-local (macOS ThisDeviceOnly / Windows per-user DPAPI / Linux libsecret w/ documented limit); no plaintext fallback ACCEPT ☐ pending
D5.4 — spawn adapter: spawn/kill/healthCheck; absolute path, shell:false, argv, env-scrub, process-group; runtime back-end via UDS 0600 (else loopback TCP, no authority); detect-and-reclaim stale runtime ACCEPT ☐ pending
D5.5 — download adapter: dumb HTTPS-only byte pump; accumulator + finalize() owned by orchestrator; trust anchor = first-party signed manifest, out-of-band from the model host; atomic temp→verified, TOCTOU-aware ACCEPT ☐ pending
D5.6 — resource probe: runtime-PID-scoped; VRAM aggregate-only, never enumerate other processes; no privilege escalation; fail-closed deny on probe failure ACCEPT ☐ pending
D5.7companionAvailable true only when integrity-verified ∧ ready ∧ recent health round-trip; recency-bounded; false on any doubt ACCEPT ☐ pending
D5.8 — no ambient authority: object-capability segregation (authority vs runtime groups), env-scrub, enforced by architecture/import + env-scrub tests (build-blocking) ACCEPT ☐ pending

On owner approval of D5.1–D5.8, the Phase 5 implementation (companion shell, run-from-source) is unblocked — itself gated on the §11 7-tier test obligation before any merge to main. Phase 7 (packaging, signing, notarization, auto-update integrity) remains a separate, later gate and is not approved by this document.

File History 2 commits
sha256:65ccb454656ea5acdea0a10e559b78bcde1eb6ff753ecc2911bc99d1c3d7cadd feat(calendar): enforce agent context tiers in retrieval AP… Human minor 1 day ago
sha256:9103f98c89257ed2b01c237cea895dabb3e85ea337dccb1161c175e4422355b6 docs: accept Calendar Events v0 spec with Phase 0 security … Human 1 day ago