FLOW-EXECUTION-GATE-CONTRACT-7A-L3.md markdown

466 lines 24.0 KB

sha256:8915fe406161f95c1681f9469375e7bae5b28c884f00bedbdef65e4b0cd0738d docs(flow): commit FLOW-V0-SPEC.md hygiene for 7A-INT merge Human 13 hours ago

Flow Execution Gate — Canonical Contract (Phase 7A, Step 7A-L3a)

Status: Contract only — Thinking step (7A-L3a). This is the frozen, canonical contract for the automatable step execution gate and run advancement: when automatable: automatable steps may advance via server-side orchestration, how manual run writes work, consent + cost caps, and how this gate stays wholly separate from external-agent grants (7A-L2). No implementation, no routes, no MCP/CLI wiring, no posture flip, and no model invocation ships in this step. The mechanical implementation (run handlers, consent ledger, ModelRuntimeAdapter bridge, Scooling live wire, seven-tier test bodies) is 7A-L3b (Auto), written to this contract without redesigning it.

Authored on branch feat/flow-projection-pilot (Knowtation). Always target the repo explicitly with muse -C ~/knowtation ….

docs/FLOW-V0-SPEC.md — §1.1 (Automatable, RunStatus, StepStateStatus), §3 (gated flow_run surfaces), §6 items 5/7/9/10 (review-before-write for outcomes; automatable gated by consent + cost caps; classroom policy; version pinning).
docs/FLOW-EXTERNAL-AGENT-CONTRACT-7A-L2.md — separate gate (SD-5); external grants and external_tool invoke never substitute for automatable execution.
docs/FLOW-AUTHORING-WRITEBACK-CONTRACT-7A-L1.md — import path; §5 sandbox carry-over extended here for automatable steps.
docs/FLOW-STORE-CONTRACT-7A-10.md — runs[] persistence; read invariants for step_states.
scooling/docs/FLOW-EXECUTION-LIVE-WIRE-CONTRACT-7A-L3.md — the consumer half (run-write + automatable execution double-lock posture) ratified field-for-field against this contract.
scooling/docs/FLOW-ADAPTERS-CONTRACT-7A-5.md — FlowRunAdapter method shapes; FLOW_RUN_WRITES_AUTHORIZED and FLOW_AUTOMATABLE_EXECUTION_AUTHORIZED.

Scope fence (7A-L3a): run-start/advance/evidence wire shapes + automatable execution orchestration rules + consent/cost-cap model + import sandbox for automatable steps + separation from SD-5 external-agent grants + error taxonomy + seven-tier test matrix only. Not in scope: handler impl, routes, MCP/CLI wiring, OpenAPI edits (land with routes in 7A-L3b), capture (7A-L4), MuseHub enrichment (7A-L5), or flipping FLOW_RUN_WRITES_ENABLED / FLOW_AUTOMATABLE_EXECUTION_ENABLED.

Simple summary

A Flow run tracks progress step by step. Until now every run write and every automatable step has been dead on arrival — you could read fixture runs but never start or advance one, and the server never executed a step for you. This contract freezes the rules for when those doors may open — still default off.

Two related capabilities, two separate locks:

Run writes — start a run, advance a step manually, attach evidence pointers, submit outcomes to review. Operational state lives in the flow store; only durable knowledge outcomes route through the review tray.
Automatable execution — for steps marked automatable: automatable only, Knowtation may orchestrate a server-side model lane (with explicit consent and cost caps) to produce a bounded execution result and advance the step — never by interpreting step text as commands, never by widening scope, and never by reusing external-agent grants from 7A-L2.

Imported Flows may declare automatable steps, but they stay inert through import and until human review approves the canonical version. Classroom/org policy may forbid automatable steps entirely. Nothing here turns the gates on.

Technical summary

The execution gate unblocks two capability families behind independent posture flags: (A) run advancement (FLOW_RUN_WRITES_ENABLED, default off) — start/advance/evidence on knowtation.flow_run/v0 with ordered-step invariants, verification-before-done, and pinned flow_version; (B) automatable step execution (FLOW_AUTOMATABLE_EXECUTION_ENABLED, default off) — server-side orchestration via ModelRuntimeAdapter + BillingAdapter reservation, requiring per-run knowtation.flow_execution_consent/v0, vault policy caps, and step-level automatable === 'automatable'. agent_assisted and manual steps never invoke automatable execution — they use manual advancement only. external_tool skill-refs remain on the SD-5 external-agent gate — automatable execution may use mcp_prompt, skill_pack, and cli refs only (vault allowlist ∩ step refs). SD-6 records the separation from SD-5. Import sandbox rejects bundles whose automatable steps exceed org policy. Triple-exposed surfaces (CLI / MCP / Hub REST) converge on one handler family.

0. Design decision (recorded as SD-6)

How do automatable steps execute safely, separately from external agents? Recorded once in scooling/docs/CROSS-REPO-COORDINATION.md → Standing Decisions as SD-6:

SD-6 — Automatable execution is consent-gated server orchestration, not external-agent authority. Steps with automatable: automatable may advance via Knowtation server-side orchestration only when FLOW_AUTOMATABLE_EXECUTION_ENABLED is on, the actor holds valid knowtation.flow_execution_consent/v0 for the run, billing reserves within caps, and org policy permits. This path uses ModelRuntimeAdapter internally — it does not mint, accept, or substitute knowtation.flow_external_grant/v0 bearers (SD-5). External agents consume read-only agent_bundle projections and invoke external_tool refs through grants; they never trigger automatable execution. Automatable execution never activates external_tool. Run operational state (flow_run/v0) mutates in the flow store; durable knowledge outcomes still route through proposals (review-before-write). Implements FLOW-V0-SPEC §6 items 5, 7, and 9 literally.

1. Two sub-gates (independent posture)

Sub-gate	Knowtation control	Default	Unlocks
Run writes	`FLOW_RUN_WRITES_ENABLED`	off	`startRun`, manual `advanceStep`, `recordEvidence`, `submitToReview`
Automatable execution	`FLOW_AUTOMATABLE_EXECUTION_ENABLED`	off	`executeAutomatableStep` server orchestration for `automatable: automatable` steps

Both may be implemented in 7A-L3b while staying off. Enabling either is Tier 3. Automatable execution requires run writes to be enabled (cannot execute without an active run), but run writes do not imply automatable execution.

Scooling mirrors with compile-time FLOW_RUN_WRITES_AUTHORIZED and FLOW_AUTOMATABLE_EXECUTION_AUTHORIZED (both hard-false) plus env double-locks (consumer contract §1).

2. Surfaces (triple-exposed when sub-gate ON — design only in 7A-L3a)

All surfaces require the relevant sub-gate (§1) and resolve authority server-side. 7A-L3a freezes shapes; 7A-L3b wires them.

Surface	Start run	Get/list runs	Advance step	Record evidence	Execute automatable	Submit to review
MCP	`flow_run` (`action:start`)	`flow_run` (`action:get`\|`list`)	`flow_run` (`action:advance`)	`flow_run` (`action:evidence`)	`flow_run` (`action:execute_automatable`)	`flow_run` (`action:submit_review`)
Hub REST	`POST /api/v1/flows/{id}/runs`	`GET /api/v1/flows/{id}/runs`, `GET …/runs/{run_id}`	`POST …/runs/{run_id}/advance`	`POST …/runs/{run_id}/evidence`	`POST …/runs/{run_id}/execute-automatable`	`POST …/runs/{run_id}/submit-review`
CLI	`knowtation flow run start …`	`knowtation flow run get\\|list …`	`knowtation flow run advance …`	`knowtation flow run evidence …`	`knowtation flow run execute …`	`knowtation flow run submit-review …`

Read paths (get/list runs) remain on the 7A-10 read store — unchanged. Write paths converge on one handler family (handleFlowRun*) with deep-equality parity across the three surfaces (§9 tier 2).

2.1 Request — start run (`flow_run` / `POST …/runs`)

{
  "flow_id": "flow_weekly_review",     // REQUIRED — readable in caller's scope
  "flow_version": "1.2.0",           // REQUIRED — semver pin; must match a visible canonical version
  "task_ref": "task_abc123",         // OPTIONAL — SD-2 link to Phase 2G task (id only)
  "external_ref": "muse:sha:…"       // OPTIONAL — lineage bridge pointer (id/hash only)
}

Response — knowtation.flow_run_start/v0:

{
  "schema": "knowtation.flow_run_start/v0",
  "run": { /* knowtation.flow_run/v0 — §3.1 */ }
}

2.2 Request — advance step (manual)

{
  "run_id": "run_2026w25",
  "step_id": "flow_weekly_review#1",
  "to_status": "in_progress|blocked|done|skipped"  // REQUIRED; never widens scope
}

Advancing to done when the step's verification sets evidence_required: true requires verified: true on that step state ⇒ 403 FLOW_VERIFICATION_UNSATISFIED.
Skipping is allowed only when the step's canonical when_not_to_run contract is satisfied by an explicit skip_reason enum (7A-L3b impl) — never from free-text alone.
Out-of-order advance ⇒ 409 FLOW_STEP_OUT_OF_ORDER.

2.3 Request — record evidence (pointer only)

{
  "run_id": "run_2026w25",
  "step_id": "flow_weekly_review#1",
  "evidence_ref": "prop_abc123",     // REQUIRED — pointer id/hash only
  "pointer_kind": "proposal|artifact|hash|test_result"  // REQUIRED — bounded enum
}

Never accepts raw content, note bodies, prompts, or completions.

2.4 Request — execute automatable step

{
  "run_id": "run_2026w25",
  "step_id": "flow_weekly_review#2",
  "consent_id": "fcons_<token>",     // REQUIRED — valid knowtation.flow_execution_consent/v0 for this run
  "model_lane": "local_default|cloud_premium",  // OPTIONAL — must ⊆ consent.allowed_lanes
  "dry_run": false                   // OPTIONAL — when true, validate gates only; no model call (7A-L3b)
}

Preconditions (all checked server-side; failures are opaque §8 codes):

Sub-gate FLOW_AUTOMATABLE_EXECUTION_ENABLED is on.
Target step's canonical automatable === 'automatable' (not manual / agent_assisted).
Run is in_progress; step is the current ordinal frontier (or explicitly in_progress).
Valid, unexpired consent_id bound to this run_id + actor.
Billing reservation succeeds within consent + vault caps.
Org/classroom policy permits automatable steps for this scope.
Step skill-refs ⊆ allowed internal kinds (mcp_prompt, skill_pack, cli) — never external_tool (SD-5).

Response — knowtation.flow_execute_automatable/v0:

{
  "schema": "knowtation.flow_execute_automatable/v0",
  "run": { /* updated knowtation.flow_run/v0 */ },
  "execution": {
    "execution_id": "fexec_<token>",
    "step_id": "flow_weekly_review#2",
    "status": "completed|failed|cost_capped|consent_denied",
    "evidence_ref": "hash_…",       // pointer only when completed + verification satisfied
    "cost_units": 42,                // bounded integer; no raw billing payload
    "model_lane": "local_default",
    "completed_at": "2026-06-20T12:00:00Z"
  }
}

The execution record never contains prompts, completions, or secrets.

{
  "run_id": "run_2026w25",
  "allowed_lanes": ["local_default"],  // REQUIRED, non-empty; ⊆ vault policy
  "cost_cap_units": 100,               // REQUIRED; server may lower to policy max
  "ttl_seconds": 3600                  // OPTIONAL; capped at policy max (default 3600, max 86400)
}

Response — knowtation.flow_execution_consent_mint/v0:

{
  "schema": "knowtation.flow_execution_consent_mint/v0",
  "consent": { /* knowtation.flow_execution_consent/v0 — §3.2 */ }
}

Consent is run-bound — not reusable across runs or flows.

2.6 Request — submit to review (durable outcome)

{
  "run_id": "run_2026w25",
  "intent": "Weekly review run outcome"  // REQUIRED, untrusted; never executed
}

Creates a standard Knowtation proposal (intent, external_ref from run lineage) — review-before-write for durable knowledge outcomes. Does not mutate canonical Flow definitions.

3. Canonical records

3.1 Run record — `knowtation.flow_run/v0` (unchanged from FLOW-V0-SPEC §1.5)

Invariants enforced on every write:

Rule	Contract
Version pin	`flow_version` immutable for the life of the run (§6 item 10).
Ordered frontier	At most one step `in_progress`; advance only to the next ordinal or explicit skip.
Done = verified	`status: done` on a step state requires `verified: true` when `evidence_required`.
Human review	Steps with `verification.kind: human_review` never receive `verified: true` from automatable execution — manual approval only.
Provenance	`provenance.actor` is hashed; `provenance.harness` is a label — never raw identity.
SD-2 link	Optional `task_ref` / `external_ref` are ids/pointers only; reciprocal link is maintained atomically when `task_ref` is supplied.

Operational run mutations write directly to the vault flow store (runs[]). They do not create proposals per tick. Only submitToReview and knowledge-producing outcomes route through /proposals (FLOW-V0-SPEC §6 item 5).

3.2 Execution consent — `knowtation.flow_execution_consent/v0`

{
  "schema": "knowtation.flow_execution_consent/v0",
  "consent_id": "fcons_<token>",
  "vault_id": "default",
  "scope": "personal|project|org",
  "run_id": "run_2026w25",
  "flow_id": "flow_weekly_review",
  "flow_version": "1.2.0",
  "allowed_lanes": ["local_default"],
  "cost_cap_units": 100,
  "cost_consumed_units": 0,
  "actor_hash": "<sha256>",
  "expires_at": "2026-06-20T13:00:00Z",
  "revoked_at": null
}

No model API keys, OAuth tokens, or billing account identifiers appear on the consent record.

4. Run advancement rules

4.1 Manual advancement (`agent_assisted` and `manual` steps)

Step `automatable`	Advancement path
`manual`	Human operator only — `advanceStep` / evidence / review.
`agent_assisted`	Human or scoped agent assists via existing agent surfaces; run adapter advances manually — no `executeAutomatableStep`.
`automatable`	Manual advancement still allowed when automatable gate is off; when gate is on, either manual advance or `executeAutomatableStep`, never both racing on the same step (optimistic concurrency on run etag — 7A-L3b).

4.2 Automatable execution orchestration (gate ON only)

Server-side pipeline (design — 7A-L3b implements):

validate gates → load pinned step (untrusted text) → resolve skill_refs (internal only)
→ BillingAdapter.reserve(cost_cap) → ModelRuntimeAdapter.run(lane, sandboxed context)
→ produce evidence pointer (hash/id) → evaluate verification (FlowVerificationAdapter rules)
→ update step_state → increment cost_consumed → emit safe observability metadata

Rule	Contract
Untrusted step text	`instruction`/`boundaries`/`output_shape` are data fed to the model sandbox — never executed, never interpreted as permission grants.
Scope frozen	Execution context is the run's `scope` — retrieval cannot widen.
No auto human_review	`human_review` verification never satisfied by automatable execution.
Cost cap	Exceeding `cost_cap_units` ⇒ `FLOW_EXECUTION_COST_CAPPED`; step stays non-`done`.
Idempotent execute	Re-posting the same `(run_id, step_id, consent_id)` while in-flight returns the in-flight `execution_id` (no double billing).

5. Separation from external-agent grants (7A-L2 / SD-5)

| Concern | External-agent gate (SD-5) | Execution gate (SD-6) | | --- | --- | --- | | Purpose | Third-party agents consume read-only bundles + invoke external_tool | Knowtation orchestrates automatable steps server-side | | Authority | knowtation.flow_external_grant/v0 bearer | knowtation.flow_execution_consent/v0 | | Skill refs | external_tool only | mcp_prompt, skill_pack, cli only | | Posture flag | FLOW_EXTERNAL_AGENT_ENABLED | FLOW_AUTOMATABLE_EXECUTION_ENABLED | | Projections | agent_bundle harness | none (operates on run state) | | Cross-use | Forbidden — a grant bearer does not satisfy execution consent; execution consent does not authorize external_tool invoke. |

A step may declare both automatable: automatable and an external_tool skill-ref; each capability activates only through its own gate and never implies the other.

6. Import sandbox carry-over (extends 7A-L1 §5 + 7A-L2 §6)

Rule	Contract
Parse-valid, runtime-inert	Bundles may declare `automatable: automatable` steps; import records them on the proposal — nothing executes on import.
Policy cap at import	When vault/org policy sets `automatable_forbidden: true`, any step with `automatable !== 'manual'` ⇒ `403 FLOW_IMPORT_AUTOMATABLE_DENIED`.
Review required	Automatable steps stay inert through approve; activation waits §1 sub-gate + §2.4 preconditions.
No privilege escalation	`instruction` text cannot activate automatable execution; only schema-valid `automatable` field counts.
External tool unchanged	`external_tool` sandbox rules remain on 7A-L2 §6 — independent of automatable import rules.
Combined bundle	A bundle with both undeclared `external_tool` refs and policy-forbidden automatable steps fails at the first sandbox violation encountered (deterministic ordering: external_tool check, then automatable policy check).

7. Posture / gating (default off)

Control	Where	Default	Tier to enable
`FLOW_RUN_WRITES_ENABLED`	Knowtation Hub/CLI/MCP policy	off	Tier 3
`FLOW_AUTOMATABLE_EXECUTION_ENABLED`	Knowtation Hub/CLI/MCP policy	off	Tier 3
`FLOW_RUN_WRITES_AUTHORIZED`	Scooling compile-time	false	Tier 3 (consumer contract)
`FLOW_AUTOMATABLE_EXECUTION_AUTHORIZED`	Scooling compile-time	false	Tier 3 (consumer contract)
Classroom / minor policy	Org policy	may forbid automatable	`FLOW_EXECUTION_POLICY_FORBIDDEN`
External-agent gate	unchanged	off	7A-L2 (SD-5) — independent

Enabling any control above is out of scope for 7A-L3a and 7A-L3b — impl ships with gates off.

8. Error taxonomy (opaque codes; no scope/id/secret leak)

New codes (7A-L3); existing codes reused unchanged:

Code	Status	When
`FLOW_RUN_WRITES_DISABLED`	403	run-write sub-gate off
`FLOW_AUTOMATABLE_EXECUTION_DISABLED`	403	automatable sub-gate off
`FLOW_EXECUTION_POLICY_FORBIDDEN`	403	org/classroom policy forbids
`FLOW_STEP_NOT_AUTOMATABLE`	400	step is `manual` or `agent_assisted`
`FLOW_EXECUTION_CONSENT_REQUIRED`	403	missing/invalid/expired consent
`FLOW_EXECUTION_CONSENT_RUN_MISMATCH`	403	consent bound to a different run
`FLOW_EXECUTION_COST_CAPPED`	403	billing cap exceeded
`FLOW_EXECUTION_LANE_DENIED`	403	requested lane ∉ consent/policy
`FLOW_VERIFICATION_UNSATISFIED`	403	advance/execute to `done` without proof
`FLOW_STEP_OUT_OF_ORDER`	409	ordinal frontier violated
`FLOW_RUN_NOT_IN_PROGRESS`	409	run terminal or not started
`FLOW_IMPORT_AUTOMATABLE_DENIED`	403	import declares automatable where policy forbids
`unknown_run`	404	missing or scope-invisible
`unknown_flow`	404	missing or scope-invisible (unchanged)
`FLOW_EXTERNAL_*`	—	not used by execution handlers (SD-5/SD-6 separation)

Codes never carry vault ids, consent tokens, model payloads, or raw step bodies.

9. Seven-tier test matrix (what each tier proves — design only)

Per RULE #0. 7A-L3b ships all seven tiers under test/flow-execution-*.test.mjs, reusing flows/starter/ bundles + a malicious-step bundle + a bundle with automatable: automatable steps + a policy-forbidden automatable bundle. No network in unit tests. Every tier runs with both sub-gates toggled independently.

Tier	File	What it proves (representative cases)
unit	`test/flow-execution-unit.test.mjs`	Consent record validates `knowtation.flow_execution_consent/v0`; execution result schema validates; ordinal frontier math; `human_review` never auto-verified; sub-gate off ⇒ handlers unreachable (test hook).
integration	`test/flow-execution-parity-integration.test.mjs`	MCP `flow_run`, `POST …/runs`, and CLI `flow run start` produce deep-equal run records; advance/evidence/execute parity across three surfaces; each sub-gate off ⇒ identical disabled code.
e2e	`test/flow-execution-e2e.test.mjs`	start → consent mint → execute automatable on pinned version → evidence pointer attached → manual advance on `manual` step → submit review creates proposal; external grant bearer does not satisfy execute; import with forbidden automatable ⇒ refused.
stress	`test/flow-execution-stress.test.mjs`	many concurrent runs; idempotent execute under parallel posts; cost_consumed increments atomically; consent expiry enforced under load.
data-integrity	`test/flow-execution-data-integrity.test.mjs`	run pin preserves `flow_version` through execute; step_states ordinal order intact; SD-2 `task_ref` round-trip; export→import preserves `automatable` field but not activation.
performance	`test/flow-execution-performance.test.mjs`	start/advance within p95 on 100-step fixture; consent mint bounded; gate checks O(steps) not O(runs²).
security	`test/flow-execution-security.test.mjs`	scope denial; no existence leak; injection in `instruction` inert (never widens scope); SD-5/SD-6 separation (grant ≠ consent); cost cap enforced; no secrets in run/consent/execution records or logs; classroom policy denies automatable; import sandbox rejects policy violations.

10. Acceptance (7A-L3a)

Run-start/advance/evidence/execute/submit wire shapes, consent model, run advancement rules, automatable orchestration preconditions, SD-5 separation, import sandbox extensions, posture defaults, error taxonomy, and seven-tier test matrix are frozen here — contract only, no implementation, no route, no OpenAPI edit, no posture flip.
Ratified against FLOW-V0-SPEC.md (§1.1, §3, §6 items 5/7/9/10), FLOW-EXTERNAL-AGENT-CONTRACT-7A-L2.md (explicit non-overlap), FLOW-AUTHORING-WRITEBACK-CONTRACT-7A-L1.md (§5 import baseline), FLOW-STORE-CONTRACT-7A-10.md (runs persistence), and the consumer contract scooling/docs/FLOW-EXECUTION-LIVE-WIRE-CONTRACT-7A-L3.md.
SD-6 recorded in scooling/docs/CROSS-REPO-COORDINATION.md.
Muse-committed on feat/flow-projection-pilot; handover regenerated to point at 7A-L3b (Auto: run handlers + consent ledger + Scooling live wire + seven-tier impl, all gates default off).

Non-goals (7A-L3)

No capture flywheel (7A-L4); no MuseHub enrichment (7A-L5).
No flip of FLOW_RUN_WRITES_ENABLED, FLOW_AUTOMATABLE_EXECUTION_ENABLED, or Scooling posture constants — enabling is Tier 3.
No real cloud model provider integrations beyond orchestration stubs for test parity in 7A-L3b.
No conflation with external-agent grants (FLOW_EXTERNAL_AGENT_ENABLED unchanged).

Handoff notes (for 7A-L3b — Auto)

Branch is feat/flow-projection-pilot; this contract is Muse-committed. Always target Knowtation with muse -C ~/knowtation ….
Add lib/flow/flow-execution.mjs (run start/advance/evidence/execute/consent + policy helpers) wired to the flow store runs[].
Wire routes/MCP/CLI/OpenAPI in the same change as handlers (no docs-only PR to main).
Extend import sandbox in flow-authoring.mjs for FLOW_IMPORT_AUTOMATABLE_DENIED.
Mirror Scooling consumer contract in flowHubTransport.ts + keep createLiveFlowRunAdapter unselected while posture flags are false.
Ship all seven tiers green before handover regen; gates stay off.

File History 1 commit