Companion App — Phase 4: Bundled Runtime Manager (Decision Core)
Status: accepted design + implementation (pure decision core; no runtime spawned, no download performed, no socket bound).
Branch: feat/companion-app (Muse-canonical; not a docs-only PR to main).
Phase table ref: Gate §12, Phase 4 — ⚡ Sonnet/auto. "Well-specified engineering once the seams exist."
Security exception: the model-download INTEGRITY path (supply-chain verification) was treated with extra rigour per the session brief — see §1 (adversarial/threat note).
Depends on: Phase 0 Decision Record (gate §13, D1–D3 accepted), Phase 1 (lib/model-runtime-lane.mjs), Phase 2 (lib/companion-loopback-guard.mjs), Phase 3 (lib/companion-oauth-pkce.mjs, lib/companion-token-custody.mjs).
Upstream: COMPANION-APP-DESIGN-AND-AUTHORIZATION-GATE.md §7 (packaging/distribution), §4.6 (no ambient authority), §10 (7-tier test obligations); COMPANION-APP-MODEL-ROUTING-AND-ENRICHMENT-ARCHITECTURE.md §2 (companion = bridge + bundled runtime), §3 (client-side constraint).
Simple summary
The companion app (Phase 5+) will bundle a local AI model (like Ollama or llama.cpp) that runs entirely on your machine. Before that bundled runtime ever starts, this phase builds the rules engine that controls its entire life:
- Download safety: before any model file is executed, we verify its SHA-256 fingerprint and exact size against a known-good record. A tampered or corrupted model is silently rejected — the runtime never starts on unverified bytes.
- Lifecycle gate: the runtime goes through clearly defined states (stopped → starting → ready → draining → stopped). The only state that allows inference is
ready. Every other state is a hard block. - Backpressure: if too many AI requests pile up at once, the rules engine starts saying "queue this" (capacity full) or "reject this" (queue full too). Nothing overflows.
- Resource ceiling: if the runtime starts eating too much RAM, VRAM, or CPU, new requests are refused until the pressure drops.
Critically, Phase 4 builds these rules as pure, I/O-free functions — no real runtime is spawned, no file is downloaded, no socket is opened. Phase 5 (a separately approved bind gate) will wire the real Ollama/llama.cpp spawn, the real TLS download, and the real OS resource probe into this decision core via injected adapters.
Technical summary
Phase 4 delivers lib/companion-runtime-manager.mjs — a pure, I/O-free decision core for the bundled runtime manager — and a 7-tier test suite (219 cases, all green).
The module enforces gate §4 item 6 ("no ambient authority") structurally: it imports no vault, canister, keychain, or auth module. Its sole output interface is decision verdicts. The injected adapter interface (RuntimeAdapterFns) is typed to model-lifecycle operations only; no data path exists through this seam.
This scope is deliberate and gate-compliant. The gate's "DOES NOT approve (no code)" list forbids "shipping any companion binary, tray helper, installer, auto-updater, or bundled runtime" and "opening any new local HTTP listener." Phase 4 ships none of these. The actual spawn, download, and bind are Phase 5's responsibility and require an explicit gate.
1. Adversarial / threat note
(a) SUPPLY-CHAIN — tampered/poisoned model file
Threat. A network-layer attacker (MITM, compromised CDN mirror, DNS hijack on the download host) can substitute a malicious model binary. Even with TLS, a compromised upstream server can serve a malicious file with a valid cert. A poisoned model binary runs arbitrary computation inside the companion process, with access to the same OS session as the real model.
Controls (built into Phase 4):
- The model spec carries
expectedDigest(SHA-256 hex, lowercase, 64 chars exactly) ANDexpectedSizeBytes. Both must match before any execution path is permitted. - Source URL is checked against an explicit
allowedSourceUrlsallowlist AND must behttps:scheme. HTTP is structurally banned: a model spec specifying an HTTP source is rejected at spec-validation time, not at download time. - The integrity accumulator (
createIntegrityAccumulator) feeds every downloaded byte to a SHA-256 hash and accumulatesreceivedBytes.finalize()uses constant-time comparison for the digest (double-SHA-256 viacrypto.timingSafeEqualto avoid length/content timing oracles) and an exact numeric equality check for size. - Fail-closed on every ambiguous spec field: missing digest, empty allowlist, zero expected size, unrecognised scheme → reject before any download starts.
- The reason strings returned by integrity verdicts are fixed constants — no model path, URL, or digest value ever appears in a reason string.
(b) RESOURCE EXHAUSTION — inference flood → OOM
Threat. A caller (or a runaway application loop) can enqueue unbounded inference requests, exhausting device RAM/VRAM and killing the user's system.
Controls (built into Phase 4):
maxInFlightcap:evaluateAdmissionreturns a hardat_capacitydenial wheninFlight >= maxInFlight.queueBoundcap: queued (pending) requests are bounded. Aqueue_fullrejection is returned whenqueued >= queueBound. The queue cannot grow without bound.- Resource-limit policy:
evaluateResourceLimitsrejects when any ceiling is exceeded. Phase 5's injected resource probe supplies the real observation; Phase 4's pure evaluator enforces the ceiling. evaluateRuntimeRequestis the single admission gate: it checks lifecycle, admission, and resource limits in order. All three gates must pass before a request proceeds.
(c) AMBIENT AUTHORITY — runtime must never reach vault/canister/JWT
Threat. The companion runtime, if misconfigured or compromised, could pivot from inference to data exfiltration through a shared authority handle.
Controls (structural):
- The module exports only decision and lifecycle functions. It has no imports of any vault, canister, keychain, or auth module.
- The injected adapter interface (
RuntimeAdapterFns) carries only:spawn,download,healthCheck,statResources. No vault accessor, no JWT emitter, no session reader. - Even if Phase 5's real adapter is compromised, it cannot reach data through this seam because the seam is typed to model-lifecycle operations only.
- No model path, binary path, or download URL appears in any reason string or verdict output.
2. Module contract — lib/companion-runtime-manager.mjs
2.1 Design constraints (security invariants)
- PURE. No I/O, no
process.envreads, nochild_process, no network, no filesystem, no logging, no clock reads. Every input is passed explicitly. - FAIL-CLOSED. Any missing, malformed, ambiguous, or unrecognised input → DENY. No fail-open branch exists.
- NO AMBIENT AUTHORITY. No vault, canister, keychain, or auth import. The injected adapter interface is the only I/O boundary.
- NO SECRET IN OUTPUT. All reason codes are fixed
RUNTIME_MANAGER_REASONSconstants. No model path, download URL, SHA-256 digest, or access token appears in any reason, return value, or thrown error. - SUPPLY-CHAIN INTEGRITY. A model file MUST pass SHA-256 + size verification via the integrity accumulator BEFORE the lifecycle can transition to
readyand beforecanServeInferencereturnstrue.
2.2 RUNTIME_MANAGER_REASONS
A frozen object of fixed string reason codes. Every returned { ok, reason } verdict uses a value from this object. The codes are:
Integrity: ok | malformed_spec | source_not_allowed | scheme_not_allowed |
size_mismatch | digest_mismatch | accumulator_finalized | accumulator_aborted
Lifecycle: invalid_transition | not_ready | unknown_event | unknown_state
Admission: malformed_admission_state | at_capacity | queue_full | no_in_flight_to_complete
Resources: malformed_limits | malformed_observation | ram_over_limit | vram_over_limit | cpu_over_limit
Top-level: malformed_request_params
2.3 Supply-chain integrity — createIntegrityAccumulator (streaming)
Signature:
createIntegrityAccumulator({ expectedDigest, expectedSizeBytes, sourceUrl, allowedSourceUrls })
→ { update(chunk: Uint8Array), finalize(): IntegrityVerdict, getReceivedBytes(): number, abort() }
Creates a streaming SHA-256 accumulator. update(chunk) feeds each received chunk. finalize() verifies the total size (exact byte count) and the SHA-256 digest using constant-time comparison. The accumulator is single-use: after finalize() or abort(), further calls return a fixed reason.
Throws (fail at creation time) if the spec or source URL fails validation.
PHASE 5 OBLIGATION: finalize().ok MUST be true before Phase 5 calls transitionLifecycle(state, 'start'). If finalize().ok is false, the downloaded file must be deleted and execution refused.
2.4 Supply-chain integrity — verifyModelBytes (in-memory)
Signature:
verifyModelBytes({ fileData, expectedDigest, expectedSizeBytes, sourceUrl, allowedSourceUrls })
→ IntegrityVerdict
Verifies an already-downloaded model held entirely in memory. Suitable for small models and testing. For large models, Phase 5 should use createIntegrityAccumulator with streaming.
2.5 Source validation
validateSourceUrl(url, allowedUrls)→{ ok, reason }— validates scheme (https:only) and allowlist membership. Fail-closed.validateIntegritySpec(expectedDigest, expectedSizeBytes)→{ ok, reason }— validates the 64-char lowercase hex digest and positive integer size.ALLOWED_SOURCE_SCHEMES=new Set(['https:'])— HTTP is banned structurally.
2.6 Lifecycle state machine
States: stopped | starting | ready | draining
Valid transitions:
stopped + start → starting
starting + health_ok → ready
starting + health_fail → stopped
ready + drain → draining
draining + stopped → stopped
Every other (state, event) pair is invalid → { ok: false, reason: 'invalid_transition' }.
Key functions:
createLifecycleState()→ initial{ state: 'stopped' }.transitionLifecycle(currentState, event)→{ ok, newState, reason? }. Pure; input not mutated.canServeInference(state)→boolean. ReturnstrueONLY forstate.state === 'ready'. Branchless, no coercion.
Security invariant: The only path to ready is stopped → starting → ready via a successful health_ok after a start. There is no direct stopped → ready transition. Phase 5's health-check loop drives this.
2.7 Backpressure / concurrency admission
Types:
AdmissionState: { maxInFlight, queueBound, inFlight, queued }
Key functions:
createAdmissionState({ maxInFlight, queueBound })— fails-closed on non-positive integer params.evaluateAdmission(state)→{ ok: true }(slot free) |{ ok: false, reason: 'at_capacity' }(full, can queue) |{ ok: false, reason: 'queue_full' }(both full, reject).recordInFlight(state)→ new state (incrementsinFlight). Pure; input not mutated.recordCompletion(state)→ new state (decrementsinFlight). Throws ifinFlight <= 0.recordQueued(state)/recordDequeued(state)→ new state (queue counter management).
Backpressure contract: evaluateRuntimeRequest does NOT side-effect the admission state. The caller calls evaluateAdmission, and only if ok === true, advances to recordInFlight before dispatching.
2.8 Resource-limit policy
Types:
ResourceLimits: { maxRamBytes, maxVramBytes, maxCpuPercent }
ResourceObservation: { ramBytes, vramBytes, cpuPercent }
Key functions:
createResourceLimits({ maxRamBytes, maxVramBytes, maxCpuPercent })— all must be positive finite numbers;maxCpuPercentmust be in(0, 100]. Throws on violation.evaluateResourceLimits(observation, limits)→{ ok, reason }. Checks RAM first, then VRAM, then CPU. Fail-closed on malformed inputs. The numeric observation values never appear in the returned reason string.
2.9 Top-level admission gate — evaluateRuntimeRequest
Signature:
evaluateRuntimeRequest({ lifecycleState, admissionState, resourceObservation, resourceLimits })
→ { ok: boolean, reason: string }
Evaluation order (must not be reordered):
- Lifecycle gate —
canServeInference(lifecycleState)→ if false,NOT_READY. - Admission gate —
evaluateAdmission(admissionState)→ if not ok, propagate reason. - Resource-limit gate —
evaluateResourceLimits(resourceObservation, resourceLimits)→ if not ok, propagate reason.
Security: Never throws on any input (try/catch converts any unexpected error to MALFORMED_REQUEST_PARAMS). The returned verdict has exactly two fields: ok and reason. No secret, path, or numeric value from inputs appears in the output.
2.10 Injected adapter interface (RuntimeAdapterFns)
Documented as JSDoc typedef in the module; no implementation is provided in Phase 4:
@typedef RuntimeAdapterFns {
spawn(opts: SpawnOpts): Promise<SpawnHandle>
download(url: string, onChunk: (chunk: Uint8Array) => void): Promise<void>
healthCheck(handle: SpawnHandle): Promise<boolean>
statResources(): Promise<ResourceObservation>
}
Security: the adapter interface carries no vault accessor, no JWT emitter, no session reader. Phase 5 must honour this boundary when implementing the real adapter.
3. Lifecycle/integrity interaction — the single path to ready
Phase 5 orchestration (pseudocode; all pure calls):
1. spec = registry.lookup(modelId)
validateSourceUrl(spec.url, config.allowedSourceUrls) // fail-closed
validateIntegritySpec(spec.digest, spec.sizeBytes) // fail-closed
2. acc = createIntegrityAccumulator({ ...spec })
await adapter.download(spec.url, chunk => acc.update(chunk))
verdict = acc.finalize()
if (!verdict.ok) → delete downloaded file; STOP. lifecycle stays 'stopped'.
3. lifecycle = transitionLifecycle(lifecycle, 'start') // stopped → starting
handle = await adapter.spawn({ binaryPath, modelPath, port, maxRamBytes })
4. healthy = await adapter.healthCheck(handle) // with retry loop
lifecycle = transitionLifecycle(lifecycle,
healthy ? 'health_ok' : 'health_fail')
5. if (lifecycle.state !== 'ready') → STOP. // never serves
6. // Per-request gate:
decision = evaluateRuntimeRequest({ lifecycleState: lifecycle, admissionState, ... })
if (!decision.ok) → return busy/capacity-exceeded to caller
admissionState = recordInFlight(admissionState)
// ... dispatch to runtime ...
admissionState = recordCompletion(admissionState)
7. // Shutdown:
lifecycle = transitionLifecycle(lifecycle, 'drain') // ready → draining
await handle.kill()
lifecycle = transitionLifecycle(lifecycle, 'stopped') // draining → stopped
8. // Phase 1 seam: companionAvailable = canServeInference(lifecycle)
// Set to true ONLY when lifecycle.state === 'ready' (step 4 onward).
// Set back to false when drain/stop are triggered (step 7).
Phase 1 seam: companionAvailable in LaneCapabilities (lib/model-runtime-lane.mjs) is set to true by Phase 5 only when canServeInference(lifecycle) returns true. This is the seam the Phase 1 design specified: Phase 5 is the authority that sets this field after a runtime health-check passes.
4. Backpressure and resource enforcement rules
Backpressure
| State | evaluateAdmission result |
Phase 5 action |
|---|---|---|
inFlight < maxInFlight |
ok: true |
Dispatch immediately; call recordInFlight. |
inFlight >= maxInFlight AND queued < queueBound |
at_capacity |
Enqueue; call recordQueued. When a slot opens (recordCompletion), dequeue, recordDequeued, recordInFlight, dispatch. |
inFlight >= maxInFlight AND queued >= queueBound |
queue_full |
Reject with 503 (runtime busy). Do not enqueue. |
The admission state is immutable in this module. Phase 5 maintains the mutable reference and advances it by replacing it with the return value of recordInFlight/recordCompletion/recordQueued/recordDequeued.
Resource limits
Phase 5 calls adapter.statResources() before each evaluateRuntimeRequest call to get the current ResourceObservation. Recommended: cache the observation for at most 500ms to avoid stat syscall overhead on every inference request.
If evaluateResourceLimits returns RAM_OVER_LIMIT or VRAM_OVER_LIMIT, Phase 5 may trigger a graceful drain (LIFECYCLE_EVENTS.DRAIN) and restart with a lower maxRamBytes CLI flag.
5. What Phase 5 must do to bind the runtime safely
The pure decision core (Phase 4) is the bouncer. Phase 5 (companion shell) installs the door. Binding the runtime process is the single most security-critical action and requires an explicit Phase 5 gate. When Phase 5 binds, it MUST:
5.1 Model download and integrity (supply-chain gate)
- Validate the model spec before starting the download:
validateSourceUrl+validateIntegritySpec. Reject immediately on any failure. - Create the accumulator before the download begins. Feed every received byte to
acc.update(chunk)via theonChunkcallback. acc.finalize()MUST returnok: truebefore the model file is executed or the lifecycle is started. Onok: false: delete the downloaded file, log only the fixedreasoncode, and refuse to calltransitionLifecycle(state, 'start').- Use streaming download via the injected
adapter.downloadto avoid loading multi-GB model files into memory entirely. The accumulator is designed for streaming (1 byte at a time is correct).
5.2 Runtime spawn
- Spawn only after integrity passes. Call
adapter.spawn({ binaryPath, modelPath, port, maxRamBytes })using the verified model path. - Bind to
127.0.0.1only (per Phase 2 §4.5). Pass the loopback bind flag to Ollama/llama.cpp's CLI. - Allocate a non-predictable ephemeral port (same principle as Phase 2 loopback guard port).
- Wire the Phase 2 loopback guard around the spawned runtime's port — the guard from
lib/companion-loopback-guard.mjssits in front of the runtime. The runtime is what admitted requests reach (Phase 2 §6).
5.3 Health-check loop
- Run the health-check retry loop after spawn. For Ollama:
GET /api/tags; for llama.cpp:GET /healthorGET /v1/models. - Call
transitionLifecycle(lifecycle, 'health_ok')on first success. After this,canServeInference(lifecycle)returnstrue— Phase 5 may then setcompanionAvailable = truein theLaneCapabilitiesit supplies toselectLane(Phase 1 seam). - On repeated failure, call
transitionLifecycle(lifecycle, 'health_fail'). This returns the lifecycle tostopped. The runtime process should be killed. Phase 5 may retry from thestartevent with exponential backoff.
5.4 Per-request gate
- Before every inference request: call
adapter.statResources()to get the currentResourceObservation, then callevaluateRuntimeRequest(...). Onok: false, return the appropriate error to the caller; do not forward to the runtime. - Advance admission state by calling
recordInFlight(admissionState)before dispatching, andrecordCompletion(admissionState)when the response completes.
5.5 Wire shape the runtime must speak
The bundled runtime must speak the OpenAI-compatible HTTP wire format on http://127.0.0.1:<port>:
POST /v1/chat/completionswith{ model, messages, max_tokens }body — used bycallOpenAiCompatinlib/daemon-llm.mjs.GET /v1/modelsorGET /api/tags(Ollama) for the health-check round-trip.
This is the same wire shape already used by lib/daemon-llm.mjs and lib/llm-complete.mjs (Ollama provider). Phase 5 can reuse those call paths with base_url = 'http://127.0.0.1:<port>'.
5.6 Minimal logging (§4.8 gate control)
- Log only verdict
reasoncodes fromevaluateRuntimeRequestandtransitionLifecycle. Never log: model path, binary path, download URL, SHA-256 digest, per-session token, JWT, or inference request body. - The Phase 2 loopback guard handles auth logging for the endpoint — Phase 5's runtime listener delegates to it.
5.7 No ambient authority
- The runtime adapter interface (
RuntimeAdapterFns) must be implemented with no reference to vault, canister, JWT, or keychain handles. The adapter is scoped to: spawn a process, download a file, probe health, probe resources. Any authority expansion requires a new gate.
6. Test obligations satisfied (gate §10, 7 tiers)
All under test/companion-runtime-manager-*.test.mjs (219 cases, all green):
| Tier | File | Focus |
|---|---|---|
| Unit | …-unit.test.mjs |
Each exported function in isolation; all lifecycle transitions (valid + invalid); integrity spec/source validation; admission bounds; resource limit evaluations; evaluateRuntimeRequest gate ordering. |
| Integration | …-integration.test.mjs |
Combined flows: integrity → lifecycle cold-start; health-fail path; drain sequence; admission cycling; resource + admission combined; streaming accumulator vs. in-memory parity. |
| End-to-end | …-e2e.test.mjs |
Realistic full session with stub adapters: download → verify → start → serve → drain. Failure branches: integrity failure, health-check failure, resource exhaustion, draining rejects new inference. |
| Stress | …-stress.test.mjs |
10k lifecycle round-trips; backpressure trips at exact maxInFlight (100) and queueBound; 50k admission evaluations; 100KB integrity accumulator with 1000 chunks; 1-byte corruption detection; 20k resource limit evaluations; 10k evaluateRuntimeRequest calls. |
| Data-integrity | …-data-integrity.test.mjs |
Determinism (1000 calls per function); no input mutation; all reasons in RUNTIME_MANAGER_REASONS; canServeInference strictly state-gated; lifecycle transition table completeness + soundness (all valid + all invalid combinations). |
| Performance | …-performance.test.mjs |
10k evaluateRuntimeRequest < 500ms; mean < 0.05ms; 50k evaluateAdmission < 500ms; 10k lifecycle round-trips < 200ms; 100KB 1-byte-chunk accumulation < 500ms; 1MB 4KB-chunk accumulation < 200ms. |
| Security | …-security.test.mjs |
Centerpiece: wrong/missing digest rejects before execution; accumulator rejects corrupted data (1-bit flip); oversized download rejected; HTTP source banned at spec-validation time; foreign-source URL rejected; empty allowlist fail-closed; lifecycle gate blocks all non-ready states; backpressure trips at exact bound (100 in-flight); 1000-request flood blocked when not ready; RAM/VRAM/CPU over-limit rejected; no ambient authority in exports; evaluateRuntimeRequest verdict has only { ok, reason } (no embedded data); no secret/URL/digest in any reason string; constant-time comparison (5× timing ratio bound); global fail-closed posture (null/undefined on all inputs). |
7. Deferred (explicitly not Phase 4)
- Real
child_process.spawnof Ollama/llama.cpp — Phase 5 (bind gate). - Real TLS download over HTTPS — Phase 5 (the injected
adapter.download). - OS resource probe (
adapter.statResources()— realprocess.memoryUsage(),/proc/meminfo, ornvidia-smi) — Phase 5. - OS-keychain read of the per-session loopback token (needed to compare with Phase 2 guard's
expectedToken) — Phase 5. - Phase 1 seam activation: setting
companionAvailable = truein the liveLaneCapabilitiesobject — Phase 5 (it callscanServeInference(lifecycle)afterhealth_ok). - Binary bundling, code signing, notarization, auto-update — Phase 7 (distribution gate).
- Multi-device fallback (phone has no companion → embeddings-only) — Phase 8.
- Scooling
ModelRuntimeAdapterwiring — Phase 10.
8. Remaining blockers to Phase 5
With Phase 4 complete, the decision core is fully built and tested. The remaining blockers to Phase 5 (companion app shell — the first phase that opens any socket or spawns any process) are:
| Blocker | Description |
|---|---|
| G1 — Server-side OAuth gate | The companion's client_id must be registered with the hosted Knowtation OAuth provider. Phase 3's pure PKCE core is ready; the server-side registration is the remaining external dependency. Without a registered client_id, the Phase 5 companion cannot complete the PKCE flow against the real authorization server. |
| G2 — Phase 5 bind-gate design | An explicit Phase 5 gate document must specify: (a) the socket bind contract (loopback, ephemeral port — per Phase 2 §6), (b) the OAuth loopback redirect listener bind, (c) the real OS-keychain adapter (Keychain/DPAPI/libsecret), (d) the real child_process.spawn adapter for Ollama/llama.cpp, (e) the real download adapter over TLS, (f) the real resource probe adapter. Phase 5 is where all deferred I/O from Phases 2, 3, and 4 converges into a single companion process. |
Recommended order:
- G1 first (OAuth gate) — it is a server-side configuration decision that does not require any code, and it unblocks the real auth round-trip for the Phase 5 companion.
- G2 second (Phase 5 bind-gate design) — design the Phase 5 shell with a thinking model (it converges Phase 2's socket bind, Phase 3's redirect listener, and Phase 4's spawn/download), then implement with Sonnet/auto once the seam contract is fixed.
Both G1 and G2 must be resolved before any companion binary is shipped (gate §12 Phase 5).