# Companion App — Design & Authorization Gate **Status:** design + authorization gate. **No companion runtime code is implemented or approved by this document.** **Layer:** Knowtation / Muse substrate (Scooling and other ecosystem products consume it via `ModelRuntimeAdapter`). **Upstream brief:** [`COMPANION-APP-MODEL-ROUTING-AND-ENRICHMENT-ARCHITECTURE.md`](COMPANION-APP-MODEL-ROUTING-AND-ENRICHMENT-ARCHITECTURE.md) (§2 companion, §3 client-side constraint, §5 OAuth, §6 billing, §8.1 localhost security, §8.2 derived-artifact paradox, §10/§12 item 3). **Related code:** `hub/bridge/server.mjs` (the service the companion evolves from), `lib/llm-complete.mjs` (provider lanes), `lib/daemon-llm.mjs` (OpenAI-compatible local/remote routing). --- ## Simple Summary The companion app is a small, optional background helper (think menu-bar / system-tray app, like the Ollama helper) that lets a person run AI **on their own computer** so their private notes never leave the device. It signs in with the same Knowtation login, downloads a local model, and exposes that model **only to programs already running on the same machine** (the browser tab or the companion itself). A cloud server **cannot** reach a model on your laptop, so the model must be called from your side. The cloud keeps doing what it already does — store data, check who you are, handle permissions and billing, and sync — and never touches the local model. This document does two things: 1. It **specifies** how the companion should be built and, most importantly, **how to secure the local model endpoint** so a malicious web page cannot quietly use it (the real risk). 2. It is an **authorization gate**: it records what is accepted as a design and what is **not yet approved to build**. No companion runtime ships on the strength of this document. It also records that one **non-companion, low-risk** piece was implemented alongside it on the same branch — the **OpenRouter "bring-your-own-key" model lane** in `lib/llm-complete.mjs` — because the brief (§12 item 2) explicitly green-lit it as a self-contained model-routing addition. It is fully tested and changes nothing for existing deployments. ## Technical Summary The companion is an **evolution of the existing `hub/bridge` Node service** plus a **bundled local inference runtime** (e.g. Ollama / llama.cpp). It authenticates as a **native/public OAuth client** using **PKCE + loopback redirect** (no client secret on device), stores the resulting JWT in the **OS keychain**, and acts against the hosted gateway/canister with **identical identity and scopes** to the web app. The hosted gateway/canister continues to serve data, identity, permissions, billing, and sync; it **never proxies local inference** (§3 hard constraint — the cloud cannot reach `localhost`). The security-critical surface is the **loopback model endpoint**. Binding to `127.0.0.1` is *not* sufficient: any web page in the user's browser can issue requests to `http://127.0.0.1:`, and **DNS-rebinding** can make a remote origin appear same-origin. The endpoint must therefore enforce a **per-session bearer token + strict `Host`/`Origin` allowlisting + non-predictable port + no permissive CORS**, and treat note bodies as **untrusted data** (prompt-injection threat model, §8.3 of the brief). This gate **accepts the design and the security model** and **defers implementation** until its explicit dependencies (hosted tenancy decisions, the §4 model-routing lane matrix and the owner-vs- member billing/consent rule) are accepted. The future implementation must satisfy the test obligations in §10 (Aaron's 7-tier standard) before any merge to `main`. --- ## Review Decision (Authorization Gate) ### This gate ACCEPTS (design only) - The **architecture**: companion = `hub/bridge` evolution + bundled local runtime; cloud serves data/identity/permissions/billing/sync and never proxies local inference (§3). - The **OAuth model**: native/public client, PKCE + loopback redirect, no device-side client secret, JWT in OS keychain, same scopes as the web session (§5). - The **localhost endpoint security model** in [§4](#4-localhost-endpoint-security-model-the-core-of-this-gate) as the binding requirement for any future implementation. - The **derived-artifact storage policy per privacy tier** in [§5](#5-derived-artifact-storage-paradox-resolution). - The **test obligations** in [§10](#10-test-obligations-7-tier-for-the-future-implementation) as a merge precondition for the future implementation. ### This gate DOES NOT approve (no code) - Shipping any companion binary, tray helper, installer, auto-updater, or bundled runtime. - Opening any new local HTTP listener / loopback model endpoint in any repo. - New canister routes, new Hub REST endpoints, new DB tables, or wire-protocol changes for the companion. - Storing derived artifacts (`ai_summary`, embeddings, insight events) under any new storage path or encryption scheme. - Any change to OAuth client registration or scopes. - Pulling the companion ahead of its dependencies (see next section). ### Hard dependencies (must be accepted BEFORE companion implementation) 1. **Hosted tenancy/teams** (brief §10A): auto-provisioned workspace owner, hosted role store, invites — and the **owner-vs-member billing + consent** rule. The companion's "may a member's local companion enrich an owner's notes?" question (§8.7) cannot be answered until this lands. 2. **Model-routing lane matrix** (brief §4) confirmed, including the client-side-inference constraint and the default-lane selection logic. 3. **Derived-artifact storage decision per privacy tier** (brief §8.2) — see [§5](#5-derived-artifact-storage-paradox-resolution). --- ## 1. Scope and non-goals **In scope (design):** companion topology, OAuth/PKCE flow, the loopback endpoint security model, the derived-artifact storage policy, packaging/distribution shape, and the consumption contract for Scooling via `ModelRuntimeAdapter`. **Out of scope (this gate):** any runtime code, installers, signing/notarization pipelines, the ZK tier (tracked separately in the brief §9), and the model-training path (Unsloth) which is explicitly distinct from inference infra (brief §10 item 2). ## 2. Architecture ``` ┌── User's machine ───────────────────────────────────────────┐ │ │ │ Browser tab (web session JWT) Companion app │ │ │ in-browser WebGPU │ (tray helper) │ │ │ (light private tasks) │ │ │ ▼ ├── OAuth PKCE ───────► system browser ──► Knowtation OAuth │ WebGPU model ├── JWT in OS keychain (Google/GitHub) │ ├── bundled local runtime (Ollama/llama.cpp) │ └── loopback model endpoint 127.0.0.1: │ ▲ token + Host/Origin allowlist │ │ local model calls (client-side) ──────────┘ │ └─────────────────────────────────────────────────────────────┘ │ data / identity / permissions / billing / sync (JWT) ▼ Hosted gateway / canister ── NEVER proxies local inference (§3) ``` - The companion **reuses the bridge's** auth/token handling, role/scope resolution, and canister client. It **adds** a bundled runtime and a guarded loopback endpoint. - **Model calls route client-side; data routes through the hosted gateway/canister** (brief §3 design rule). ## 3. OAuth (native/public client, PKCE + loopback) - Companion opens the **system browser** and runs the standard Knowtation Google/GitHub OAuth flow with **PKCE** (RFC 7636) and a **loopback redirect** (`http://127.0.0.1:/callback`, RFC 8252). **No client secret** is embedded in the distributed binary. - On success it receives the **same JWT** the web app gets and stores it in the **OS keychain** (Keychain / DPAPI / libsecret). It then acts as the user against the hosted gateway/canister with identical scopes. - The **local model endpoint requires no separate login** — it is a loopback-only service bound to the authenticated session (secured per §4). In-browser inference reuses the existing web session (no extra auth). ## 4. Localhost endpoint security model (the core of this gate) Binding to `127.0.0.1` is **necessary but not sufficient**. The endpoint MUST enforce **all** of: 1. **Bearer token on every request.** A high-entropy, per-session token is generated at companion start, stored in the OS keychain, and required on every call to the loopback endpoint. Requests without the exact token are rejected `401` before any model work. 2. **Strict `Host` header allowlist.** Accept only `127.0.0.1:` / `localhost:` literals. Reject any other `Host` value `403` — this is the primary **DNS-rebinding** defense (a rebound domain presents an attacker `Host`). 3. **Strict `Origin`/`Sec-Fetch-Site` checks.** Reject cross-site browser origins. **No wildcard CORS**, no `Access-Control-Allow-Origin: *`, and no reflecting arbitrary `Origin`. 4. **Non-predictable ephemeral port**, not a fixed well-known port, to raise the cost of blind probing (defense-in-depth, never the sole control). 5. **Loopback bind only** (`127.0.0.1`, never `0.0.0.0`). 6. **No ambient authority.** The endpoint exposes only model inference; it never exposes vault read/write, the canister client, or the stored JWT. 7. **Untrusted-input handling.** Note bodies are passed to the model strictly as **data**, never as instructions or as a source of headers/URLs (prompt-injection threat model, brief §8.3). 8. **Rate limiting + minimal logging.** Bound request rate; never log token, JWT, or note bodies. A future implementation that omits any of items 1–3, 5, or 6 fails this gate. ## 5. Derived-artifact storage paradox resolution If inference runs privately on-device but `ai_summary` / embeddings / insight events are written to the **cloud canister**, the derived content has effectively left the device (brief §8.2). Policy by tier: | Privacy tier | Where derived artifacts live | Rationale | | --- | --- | --- | | **Convenience** (server holds key) | Cloud canister, as today | No additional privacy claim; full server-side features. | | **Privacy-max / ZK** (user holds key) | **Local-only, or client-encrypted before upload** | Derived content must not be readable by the host; aligns with the ZK tier (brief §9). | The ZK encryption hierarchy itself is **out of scope here** (brief §9 owns it). This gate only fixes the **policy**: privacy-max derived artifacts are never stored as host-readable plaintext. ## 6. Provenance Derived artifacts produced via the companion record `generated_by`, `model`, `version`, `date`, and `source_event_id` (brief §8.4). Re-enrichment is triggered on model upgrade. This is a **provenance flag, not a lifecycle state** — it must not force notes through the proposal pipeline (brief §7.3). ## 7. Packaging / distribution (design intent, not approved to build) - Shape: a **tray/background helper** that bundles the local runtime; auto-update channel; code signing/notarization per OS; least-privilege OS permissions for the runtime. - Multi-device (brief §8.5): phone (no WebGPU/companion) vs laptop (companion) — compute where capable; cached-result location follows the §5 storage policy. - Offline/fallback (brief §8.6): companion offline or device incapable → graceful fallback (in-browser, managed-with-consent, or embeddings-only) and later re-sync of cached enrichment. ## 8. Scooling consumption contract Scooling consumes the companion lane **only** through `ModelRuntimeAdapter` (no Scooling-specific inference infra, no separate model billing). A Scooling managed-lane call is a metered event against the user's **Knowtation** packs; local/in-browser/BYO lanes are **not** metered (brief §6). ## 9. The OpenRouter lane (implemented on this branch — model-routing precursor) Per brief §12 item 2, the **OpenRouter provider lane** was added to `lib/llm-complete.mjs` as a self-contained, low-risk addition (OpenAI-compatible wire format, same shape as the existing DeepInfra path). It is **not** the companion and does not depend on this gate's deferred items. - **Activation:** `KNOWTATION_CHAT_PROVIDER=openrouter` + `OPENROUTER_API_KEY` (BYO key). - **Model:** `config.llm.openrouter_chat_model` → `OPENROUTER_CHAT_MODEL` → default `openai/gpt-4o-mini`. - **Optional attribution:** `OPENROUTER_SITE_URL` → `HTTP-Referer`, `OPENROUTER_APP_TITLE` → `X-Title` (sent only when set). - **Privacy/billing rule (enforced + tested):** **no silent fallback** to a managed lane on failure — a BYO-key failure surfaces rather than re-routing note text to a metered provider (brief §4/§6). - **Backward compatibility (enforced + tested):** OpenRouter is **explicit-only**; adding `OPENROUTER_API_KEY` alone never changes the provider for an existing deployment. - **UI:** OpenRouter is already a selectable provider in the Hub **Settings → Consolidation** chat- provider dropdown (`web/hub/index.html`, with the `https://openrouter.ai/api/v1` base-URL field via `lib/daemon-llm.mjs`). The new lane wires the same provider into the `completeChat` path used by MCP summarize and Hub proposal LLM jobs. Env documented in `.env.example`. - **Tests:** 7 tiers under `test/llm-complete-openrouter-*.test.mjs` (32 cases): unit, integration, e2e, stress, data-integrity, performance, security. > Note: there is **no** existing Hub UI that lets a user pick the `completeChat`/`KNOWTATION_CHAT_PROVIDER` > provider (DeepInfra/OpenAI/Anthropic/Ollama are env-selected, not UI-selected). The brief's phrasing > "expose it in the integrations UI alongside the existing DeepInfra/OpenAI/Anthropic/Ollama options" > describes a UI surface that does not exist for this code path; the truthful exposure is the > consolidation provider dropdown (already lists OpenRouter) plus `.env.example`. A dedicated > chat-provider settings UI, if desired, is a separate follow-up. ## 10. Test obligations (7-tier) for the future implementation When the companion is approved and implemented, each component (loopback endpoint, OAuth/PKCE flow, runtime manager) must ship with the full 7-tier suite before any merge to `main`: 1. **Unit** — token check, `Host`/`Origin` allowlist, port binding, model adapter. 2. **Integration** — OAuth PKCE loopback round-trip; endpoint + runtime; keychain read/write. 3. **End-to-end** — sign in → download model → enrich a note locally → result handled per §5 policy. 4. **Stress** — concurrent inference requests; runtime backpressure; many auth attempts. 5. **Data-integrity** — derived-artifact provenance fields; no plaintext leak in privacy-max tier. 6. **Performance** — endpoint overhead bounds; runtime cold-start; no event-loop starvation. 7. **Security** — DNS-rebinding rejection, cross-origin rejection, missing/invalid token rejection, no ambient authority, prompt-injection (note body as data), no secret in logs/errors. ## 11. Deferred / open questions (carried from the brief) - Owner-vs-member: whose packs, whose consent, may a member's companion enrich an owner's notes (§8.7) — blocked on tenancy. - Consent & data lifecycle for auto-enrichment, stricter for minors/classrooms; retention/deletion of derived artifacts (§8.8). - Quality/eval loop for cheap/local enrichment (§8.9). - Abuse/quota on the managed lane (§8.10). - Distribution/signing/auto-update specifics (§8.11). ## 12. Build phases & model-tier guidance This section is a **roadmap, not an approval**. The gate's "DOES NOT approve" list still holds: no companion runtime code starts until **Phase 0** resolves the three hard dependencies (tenancy/consent, lane matrix, storage decision). Phases are sequenced; later phases assume the earlier seams are accepted. ### Model-tier legend - 🧠 **Thinking model** (extended-reasoning, e.g. a high-thinking model) — use wherever a subtle mistake becomes a **security hole, privacy breach, cryptographic weakness, or wrong multi-tenant policy**. These phases involve adversarial reasoning, protocol/crypto correctness, or consent rules where "looks right" is not good enough. - ⚡ **Sonnet / automatic** — implementation against an **already-accepted design**: plumbing, UI wiring, runtime lifecycle, packaging mechanics, routine test tiers. Cursor automatic model mode is appropriate here. - 🔀 **Hybrid** — **design/spec the seam with a thinking model, then implement with Sonnet/auto.** Use a thinking model for the interface contract and threat surface, switch to Sonnet/auto for the body once the contract is fixed. ### Phase table | # | Phase | Depends on | Model tier | | --- | --- | --- | --- | | 0 | **Decision gates** — resolve the three hard dependencies: hosted tenancy + owner-vs-member billing/consent (§8.7), the §4 lane matrix + default-lane logic, the per-tier derived-artifact storage decision (§5). Output: accepted decisions, not code. | — | 🧠 Thinking | | 1 | **`ModelRuntimeAdapter` seam + lane matrix** — the abstraction Hub/Scooling consume; lane selection (managed / in-browser / companion / BYO) and metering boundary (§8). | 0 | 🔀 Hybrid | | 2 | **Loopback endpoint security core** — per-session bearer token, `Host`/`Origin` allowlist, DNS-rebinding defense, non-predictable port, loopback bind, no ambient authority, untrusted-input handling (§4 items 1–8). | 1 | 🧠 Thinking | | 3 | **OAuth native/public client** — PKCE + loopback redirect (RFC 7636/8252), no device-side secret, JWT in OS keychain (Keychain/DPAPI/libsecret) (§3). | 1 | 🧠 Thinking | | 4 | **Bundled runtime manager** — Ollama/llama.cpp lifecycle, model download/verify, cold-start, backpressure, resource limits (§7). | 1 | ⚡ Sonnet/auto | | 5 | **Companion app shell** — tray/background helper integrating phases 2–4; session wiring to the hosted gateway/canister (§2). | 2, 3, 4 | ⚡ Sonnet/auto | | 6 | **Derived-artifact storage + provenance enforcement** — per-tier policy (§5), `generated_by`/`model`/`version`/`source_event_id` (§6), client-encryption hook for the privacy-max/ZK tier. | 0, 5 | 🔀 Hybrid | | 7 | **Packaging / distribution** — code signing, notarization, least-privilege OS perms, auto-update channel + update integrity (§7). | 5 | 🔀 Hybrid | | 8 | **Multi-device & offline fallback** — capability detection, graceful fallback (in-browser / managed-with-consent / embeddings-only), later re-sync of cached enrichment (§7). | 5, 6 | ⚡ Sonnet/auto | | 9 | **7-tier test suites** — per component (§10). Security-tier design (DNS-rebinding, cross-origin, missing/invalid token, no ambient authority, prompt-injection, no secret in logs) is reasoning-heavy; the other tiers are routine. | per-phase | 🔀 Hybrid (security tier 🧠; unit/integration/e2e/stress/perf ⚡) | | 10 | **Scooling consumption wiring** — consume the companion lane via `ModelRuntimeAdapter` only; managed-lane metering against Knowtation packs; local/in-browser/BYO unmetered (§8). | 1, 6 | ⚡ Sonnet/auto | ### Why the 🧠 / 🔀 phases need deeper reasoning - **Phase 0** decides consent and money flow across tenants. A wrong rule here (e.g. a member's companion silently enriching an owner's notes) is a privacy/billing defect that propagates into every later phase. Reason it through explicitly. - **Phase 2** is *the* core of this gate. DNS-rebinding and cross-origin abuse are adversarial; the defense must be argued against an attacker model, not pattern-matched. This phase, and its security tests in Phase 9, are the highest-leverage place for a thinking model. - **Phase 3** is auth/crypto protocol correctness (PKCE, redirect handling, keychain). Subtle deviations create real account-compromise paths. - **Phases 1, 6, 7** are hybrids: the **contract/threat surface** (adapter interface, ZK encryption boundary, update-integrity/supply-chain) warrants a thinking model; the bulk implementation does not. Fix the seam first, then drop to Sonnet/auto. - **Phases 4, 5, 8, 10** are well-specified engineering once the seams exist — Sonnet or automatic model mode is appropriate and cheaper. --- ## 13. Phase 0 — Decision Record (the three hard dependencies) **Status:** DRAFT — awaiting owner approval. **No code.** The gate's [“DOES NOT approve (no code)”](#this-gate-does-not-approve-no-code) list remains fully in force. **Branch:** `feat/companion-app` (Muse-canonical; not a docs-only PR to `main`). **Model tier:** 🧠 Thinking (§12 phase table, row 0) — these are consent/money/privacy rules where a wrong default propagates into every later phase. **Purpose:** resolve the three items under [“Hard dependencies (must be accepted BEFORE companion implementation)”](#hard-dependencies-must-be-accepted-before-companion-implementation). Output is **accepted decisions**, not implementation. ### 13.0 Grounding (decisions anchored to existing code, not assumptions) | Decision area | Source of truth in the codebase | | --- | --- | | Tenancy / delegation resolution | `hub/lib/hosted-workspace-resolve.mjs` → `resolveEffectiveCanisterUser`, `resolveAllowedVaultIdsForHostedContext`; `HOSTED_VALID_ROLES = {admin, editor, viewer, evaluator}` | | Hosted owner stub (today) | brief §10A: `/api/v1/workspace` → `owner_user_id: null`; invites “not supported on hosted yet”; roles env-only | | Billing / packs / metering | `hub/gateway/billing-constants.mjs` (tiers `free·plus·growth·pro`, `PACK_TOKENS`, `COST_CENTS`), `hub/gateway/billing-middleware.mjs` (`runBillingGate` meters on `getUserId(req)`) | | Platform operator vs workspace owner | `HUB_ADMIN_USER_IDS` (global allowlist) — distinct from any workspace role | | Scooling lane enum | `scooling/src/adapters/types.ts` → `runtimeLaneSchema = [local, self_hosted, enterprise, openrouter, direct_provider, disabled]` | | Enrichment artifacts | `mcp/tools/index-enrich.mjs` (`ai_summary`), `lib/tag-suggest.mjs` (embeddings), `lib/memory-consolidate.mjs` (`runDiscoverPass` → connections/contradictions/open_questions/topic_count) | ### Decision index | ID | Hard dependency | Outcome | | --- | --- | --- | | **D1** | Hosted tenancy + owner-vs-member billing/consent (gate item 1; brief §8.7, §10A) | **ACCEPT, with conditions** | | **D2** | Model-routing lane matrix + default-lane logic + client-side constraint (gate item 2; brief §3, §4) | **CONFIRM** | | **D3** | Derived-artifact storage per privacy tier (gate item 3; brief §8.2; gate §5) | **CONFIRM, with per-artifact detail** | --- ### D1 — Hosted tenancy + owner-vs-member billing/consent **Simple summary.** Every person owns their own workspace. Someone you invite (a “member”) can only touch your notes if you gave them a role that already lets them read those notes. A member running AI **on their own computer** (their companion) over your notes is **free** and is allowed **only** if (a) they could already read those notes and (b) you turned on “let my team enrich my notes.” You are never billed for a member’s on-device work; you are only billed when work uses the paid cloud lane on **your** workspace — and members can’t trigger that paid lane on your workspace unless you explicitly allow it. For a Privacy-max (zero-knowledge) workspace, the math itself stops a member from reading anything you didn’t cryptographically share with them. **Technical summary.** Tenancy uses the existing owner/delegation primitive (`resolveEffectiveCanisterUser`): an actor acts on their **own** canister partition unless they appear in the owner’s hosted role store (`HOSTED_VALID_ROLES`), in which case `delegate = true` and `effective = owner`. Phase 0 fixes the **policy** layered on that primitive; the tenancy implementation (auto-owner provisioning, hosted role store, invites) is its **own** design + gate (brief §10A) and is a prerequisite, not part of this record. **D1.1 — Workspace ownership.** Each user is auto-provisioned as **owner of exactly one workspace** on first sign-in. The **platform operator** (`HUB_ADMIN_USER_IDS`) is a separate, global, rare role and is **never** a workspace role. A user is therefore *owner of their own* and *member of others’* (via delegation). **Binding security constraint:** auto-owner provisioning must never let actor A reach actor B’s partition unless B placed A in B’s role store — i.e. the `effective` resolution is the *only* path to another partition. (Proof obligation belongs to the tenancy gate.) **D1.2 — Billing principal = the workspace whose partition is written.** Metered operations (`COST_CENTS`: search/index/note_write/proposal_write/consolidation) and **managed-cloud model calls** bill against the **owner of the partition the operation executes on** (`effective` user), **not** the requesting actor when they are a delegate. Rationale: the data, storage, and provider-cost are the owner’s; the owner controls workspace spend. Solo operations on a user’s own partition bill to that user (owner == actor). **Implementation note (binding):** `runBillingGate` currently meters on a single `getUserId(req)`; the tenancy work must supply the **effective/owner** id as the billing identity for delegated requests. Until that exists, **delegated managed-lane and metered ops are not enabled** (see D1.4). **D1.3 — May a member’s companion enrich an owner’s notes? (brief §8.7 — the crux).** **Yes, but only when ALL of the following hold:** 1. **No new read capability.** The member already has body-read scope on those notes via role + `resolveAllowedVaultIdsForHostedContext` / scope map. Local inference grants **zero** additional read access — it can only process what the member could already read. 2. **Owner opt-in.** The owner has enabled **“allow delegated companion enrichment”** at the workspace level. **Default: OFF.** 3. **ZK is self-enforcing.** For a Privacy-max/ZK owner, the member can only enrich notes whose per-note DEK the owner has wrapped to the member’s key (brief §9.4). No new mechanism — the cryptography is the gate; an un-shared note is unreadable on the member’s device, full stop. 4. **Provenance, downgrade-safe.** The written artifact records `generated_by = member actor`, `source = companion`, `model`, `version`, `date`, `source_event_id` (§6), and is stored under the **owner’s** privacy tier per D3 — a member’s companion must **never downgrade** an owner’s tier (a ZK owner’s artifact stays client-encrypted even though the member generated it). 5. **Consent-tracked + quota-bounded.** The enrichment event is consent-logged and counts against the workspace’s enrichment quota (abuse control, brief §8.10). **Billing of D1.3:** a member’s companion is the **local lane → not metered** (brief §6 principle 1). The owner is therefore **not** billed for a member’s on-device enrichment (no provider cost exists to meter). Only a **managed-lane** path would be billable, and that is governed by D1.2 + D1.4. **D1.3 clarification (added during Phase 1 implementation review — RATIFIED by owner 2026-06-05).** The Phase 1 seam (`lib/model-runtime-lane.mjs`, `enforceConsentPolicy`) enforces D1.3(2) as a **fail-closed** gate (`delegatedEnrichmentAllowed`, default OFF) on a delegate’s enrichment write-back to the owner’s partition. Two implementation specifics were resolved that D1.3 above did not spell out: 1. **Scope of the gate by lane.** The opt-in gates the **`local` companion lane** (named in D1.3) **and** the **`openrouter` BYO-key lane** — both route the owner’s note text **off the owner’s own infrastructure** (local = the *delegate’s* device; openrouter = the *delegate’s* third-party contract), so they are treated identically. Org lanes (`self_hosted`, `enterprise`) are **not** gated by this individual opt-in — the org controls the endpoint and governs that path by org policy. The managed lane (`direct_provider`) remains under **D1.4** (`delegatedManagedAllowed`). 2. **Enrichment vs. ephemeral completion.** The gate applies only when the call **writes a derived artifact to the owner’s partition** (`enrichesDelegatedPartition=true`). A read-only/ephemeral completion by a delegate who already has read scope (D1.3(1)) is allowed — it produces no owner-attributed artifact. This closes the gate §12 canonical defect (*“a member’s companion silently enriching an owner’s notes”*), which the original Phase 1 contract left as a silent `allow`. **Owner ratification (2026-06-05):** item 1 accepted — the openrouter BYO lane is gated identically to the companion, because a delegate’s BYO key routes the owner’s note text to a third party (higher egress than the on-device companion), so leaving it ungated would guard the lower-risk path and expose the higher- risk one. The owner opt-in (`delegatedEnrichmentAllowed`, default OFF) is a one-time flip that covers a team’s deliberately shared key. **D1.4 — Consent + quota defaults (anti-surprise-spend).** - **Members cannot trigger the managed (paid) lane on an owner’s partition by default.** It is **OFF** until the owner explicitly enables it, and even then is bounded by an owner-set per-member quota. This prevents a careless/malicious delegate from draining the owner’s packs (`PACK_TOKENS`). - **Members can always read the owner’s already-produced derived artifacts** (subject to scope) — reading is free; only *producing* via a paid lane is gated. - Auto-enrichment of private notes (even local) is consent-tracked; **stricter rules for minors/classrooms** are deferred to the consent/data-lifecycle item (gate §11) but the **default-OFF posture above is the safe baseline** until that lands. **D1 adversarial check.** Threat: delegate exfiltrates owner plaintext via local model. → Bounded by (1): the delegate already had read access; local inference adds no exfil path beyond the role grant, and for ZK owners the crypto prevents it outright. Threat: delegate drains owner packs. → Bounded by D1.4 default-OFF + per-member quota. Threat: auto-owner escalation into another partition. → Bounded by D1.1 (effective-resolution is the only cross-partition path). Threat: member’s companion silently downgrades an owner’s ZK artifact to host-readable. → Forbidden by D1.3(4) + D3. **D1 outcome: ACCEPTED as policy.** Hard prerequisite: the **tenancy implementation gate** (brief §10A) must land auto-owner provisioning, the hosted role store, invites, and **effective/owner billing identity** before any companion phase that writes to a delegated partition. --- ### D2 — Model-routing lane matrix, default-lane logic, client-side constraint **Simple summary.** There are a few “lanes” a model call can travel. Cheap cloud is the default for solo users; a one-click “keep my data on my device” switch sends light tasks to the browser and heavy private tasks to the companion; privacy-focused orgs default to their own server or their own key and turn the cloud lane off. The unbreakable rule: **a model on your machine is always called from your machine** — the cloud never reaches into your laptop to run it. **Technical summary.** Phase 0 **confirms the brief §4 matrix** as the canonical Knowtation lane set and the brief §3 client-side-inference constraint, and fixes the mapping to Scooling’s `runtimeLaneSchema` so Phase 1’s `ModelRuntimeAdapter` uses stable lane identifiers. **D2.1 — Confirmed lane set (canonical, from brief §4):** | Lane | Privacy | Billing | Invoked | | --- | --- | --- | --- | | Managed cloud — cheap (default for individuals) | Low (text → 3rd party; needs consent for private text) | **Packs (metered)** | Cloud gateway | | Managed cloud — premium | Low | **Packs (metered)** | Cloud gateway | | In-browser (WebGPU / WebLLM) | High (runs in tab) | **Free** | **Client-side** | | Companion (bundled local runtime) | Highest (never leaves device) | **Free** (user compute) | **Client-side** | | Self-hosted / enterprise endpoint | High (org infra) | Free / org contract | Org endpoint | | BYO key (OpenRouter / direct provider) | Medium (user’s own contract) | **No packs** (user pays provider) | Provider | The four lanes the companion design pivots on are **managed / in-browser / companion / BYO-key**; the self-hosted/enterprise lane is the org-privacy variant. “Managed” has cheap + premium tiers. **D2.2 — Default-lane selection logic (confirmed, brief §4 “Defaults”):** - **Individual hosted user →** managed **cheap-model** lane by default; a one-click **“keep my data on my device”** toggle routes light tasks to **in-browser**, and **offers the companion** when the task is too heavy for the browser. - **Privacy-focused org →** default **self-hosted / BYO endpoint**; managed lane **OFF** (a selling point). - **Product picks the safe default and shows the trade-off.** Graceful **fallback chain** when a device can’t run a client-side lane: **in-browser → companion → managed-with-explicit-consent → embeddings-only.** - Private text never goes to a managed lane without explicit per-action consent. **D2.3 — Client-side-inference HARD CONSTRAINT (confirmed, brief §3):** the cloud gateway/canister **never proxies local/private inference**. In-browser and companion lanes are invoked **only** by something on the user’s machine (the tab or the companion). The cloud continues to serve **data, identity, permissions, billing, sync** and nothing else for these lanes. **Any future design that routes `localhost`/on-device inference through the cloud fails this gate.** **D2.4 — Mapping to Scooling `runtimeLaneSchema` (factual reconciliation).** Scooling already exposes `[local, self_hosted, enterprise, openrouter, direct_provider, disabled]`. Canonical mapping adopted at the `ModelRuntimeAdapter` boundary (Phase 1): | Brief §4 lane | Scooling lane | Note | | --- | --- | --- | | In-browser **and** Companion | `local` | Both are *client-side*; the in-browser-vs-companion choice is a Knowtation-internal device-capability decision, opaque to Scooling. | | Self-hosted | `self_hosted` | Org endpoint. | | Enterprise endpoint | `enterprise` | Org contract endpoint. | | BYO key (OpenRouter) | `openrouter` | Already present; provider arrives “for free” via the §9 OpenRouter lane. | | Managed cloud (cheap/premium) & direct BYO provider | `direct_provider` | Managed/premium routed through **Knowtation packs**; Scooling runs **no** model billing (brief §6.3, gate §8). | | Lane off / fallback exhausted | `disabled` | Falls back to embeddings-only / no inference. | **D2 outcome: CONFIRMED.** The §4 matrix, the default-lane logic, the client-side constraint, and the Scooling mapping are accepted as the basis for the Phase 1 `ModelRuntimeAdapter` seam. --- ### D3 — Derived-artifact storage per privacy tier **Simple summary.** Where do the AI by-products live — the short summary of a note, the math “fingerprints” used for search (embeddings), and the insight events (connections, open questions)? For **Convenience** users: in the cloud, as today. For **Privacy-max** users: **never** as something the host can read — either kept on the device or encrypted with the user’s own key before it’s uploaded. Generating something privately on-device and then storing it readable in the cloud would quietly defeat the privacy promise; this decision forbids that. **Technical summary.** Phase 0 **confirms gate §5** and finalizes it **per artifact type**. The ZK key hierarchy itself stays out of scope (brief §9 owns it); this record fixes only the **storage location + host-readability policy**. **Critical clarification of the “paradox”:** today’s memory events use **AES-256-GCM with a server-held key** (`KNOWTATION_MEMORY_SECRET`, brief §9.1) — that is **encryption-at-rest, NOT zero-knowledge**, because the operator can decrypt. For Privacy-max, “client-encrypted before upload” means a **client-held (ZK) key**; the existing server-held-key encryption does **not** satisfy the Privacy-max requirement. **D3.1 — Per-artifact, per-tier matrix (final):** | Artifact | Convenience (server holds key) | Privacy-max / ZK (user holds key) | | --- | --- | --- | | `ai_summary` (`mcp/tools/index-enrich.mjs`) | Cloud canister, host-readable plaintext, as today | **Local-only cache, or client-encrypted (envelope under per-note/vault DEK) before upload.** Stored as **ciphertext only**; host cannot read. | | Embeddings / vectors (`lib/tag-suggest.mjs`) | Cloud canister server-side vector index, as today | **Computed client-side**; vectors stored server-side **only as encrypted-at-rest ciphertext** (enables sync/backup) **or** kept local-only; **plaintext vectors never leave the device**; vector search runs client-side (brief §9.5). | | Insight events (`runDiscoverPass`: connections / contradictions / open_questions / topic_count) | Cloud canister memory store, as today (server-readable even where AES-256-GCM-at-rest, per server-held key) | **Computed client-side (companion)**; stored **client-encrypted** under `DEK-memory`; host cannot read (brief §9.5). | **D3.2 — Binding policy (gate §5, restated and locked):** Privacy-max derived artifacts are **never** stored as **host-readable plaintext** and are **never** stored under a **server-held key**. The only acceptable Privacy-max storage states are **(a) local-only** or **(b) client-encrypted under a user-held key** before upload. **D3.3 — Multi-device interaction (brief §8.5).** Cached-result location follows D3.1. For Privacy-max, artifacts sync between devices **only as ciphertext**; a device without the key (e.g. a phone with no companion) sees ciphertext and falls back per D2.2 (embeddings-only / no AI) until that device’s key is enrolled (cross-device DEK re-wrap is ZK’s concern, brief §9.4 — out of scope here). **D3.4 — Retention / deletion (brief §8.8, baseline).** Derived artifacts inherit the **source note’s** retention; deleting a note deletes its derived artifacts (summary, vectors, insight events). For Privacy-max, destroying the key **crypto-shreds** all derived artifacts (they become permanently unreadable). Detailed lifecycle/minors rules remain in the deferred consent item (gate §11); this is the safe baseline. **D3 outcome: CONFIRMED.** Per-artifact storage is fixed; the encryption mechanism is delegated to the ZK tier (brief §9). --- ### 13.1 What Phase 0 unblocks With D1–D3 accepted, the following become available to later phases: - **Phase 1 (`ModelRuntimeAdapter` seam + lane matrix):** lane identifiers (D2.1), default-lane logic (D2.2), client-side constraint (D2.3), Scooling mapping (D2.4), and the metering boundary (D1.2 — owner-billed, managed-only). - **Phase 6 (derived-artifact storage + provenance):** per-tier storage policy (D3) and provenance fields (D1.3(4), §6). - **Phase 10 (Scooling consumption):** the unmetered-local / owner-billed-managed rule (D1.2, D2.4). ### 13.2 What remains NOT approved by this record Phase 0 approves **decisions only**. The gate’s [“DOES NOT approve (no code)”](#this-gate-does-not-approve-no-code) list is unchanged: no binary, no loopback listener, no new canister/Hub routes, no new storage paths, no OAuth scope changes. D1 additionally has a **hard prerequisite**: the **tenancy implementation gate** (auto-owner provisioning, hosted role store, invites, effective/owner billing identity) must land before any companion phase writes to a delegated partition. ### 13.3 Explicitly deferred (not Phase 0 blockers) Consent/data-lifecycle detail incl. minors/classrooms (gate §11), quality/eval loop (§11), managed-lane abuse/quota specifics (§11), distribution/signing/auto-update (§7/§11), and the entire ZK key hierarchy + PQC (brief §9). None block Phase 1; D1/D3 carry the safe default-OFF / never- host-readable baselines until they land. ### 13.4 Approval | Decision | Recommendation | Owner approval | | --- | --- | --- | | D1 — tenancy + owner-vs-member billing/consent | ACCEPT (with tenancy-gate prerequisite) | ☐ pending | | D2 — lane matrix + defaults + client-side constraint | CONFIRM | ☐ pending | | D3 — derived-artifact storage per tier | CONFIRM | ☐ pending | On owner approval of D1–D3, Phase 0 is **complete** and work proceeds to **Phase 1 — 🔀 Hybrid: `ModelRuntimeAdapter` seam** (design/spec the seam with a thinking model, implement with Sonnet/auto).