SECTION-SOURCE-HOSTED-IMPLEMENTATION-SPEC.md
file-level
1
files
1
commits
0
hotspots
0
🧊 dead
0
💥 blast risk
| 1 | # SectionSource Hosted Implementation Spec |
| 2 | |
| 3 | ## Simple Summary |
| 4 | |
| 5 | Phase 1K specifies the future hosted MCP implementation for body-free |
| 6 | `get_section_source`. |
| 7 | |
| 8 | This phase is planning only. It does not register hosted `get_section_source`, add hosted |
| 9 | ACL entries, add Hub routes, add search or persistence, add Scooling runtime behavior, or |
| 10 | return note bodies, section bodies, snippets, full frontmatter, provider payloads, resource |
| 11 | URIs, line ranges, byte offsets, section body lengths, or absolute paths. |
| 12 | |
| 13 | ## Technical Summary |
| 14 | |
| 15 | The future hosted `get_section_source` tool must mirror the adjacent hosted one-note read |
| 16 | tools: `get_note_outline`, `get_document_tree`, and `get_metadata_facets`. |
| 17 | |
| 18 | The accepted future behavior is: |
| 19 | |
| 20 | - require an authenticated hosted MCP session |
| 21 | - pass the hosted role ACL before registration |
| 22 | - use the active hosted vault from `ctx.vaultId` |
| 23 | - use the effective canister user from `ctx.canisterUserId`, falling back to `ctx.userId` |
| 24 | - send the same canister auth headers as adjacent hosted note-read tools |
| 25 | - normalize and reject unsafe paths before any upstream fetch |
| 26 | - read exactly one note from the canister |
| 27 | - derive body-free `knowtation.section_source/v0` metadata from that note body in memory |
| 28 | - return only the SectionSource v0 allowlist |
| 29 | - sanitize invalid, missing, unauthorized, and upstream errors |
| 30 | |
| 31 | ## Planning Decision |
| 32 | |
| 33 | Phase 1K accepts the hosted implementation specification only. |
| 34 | |
| 35 | It does not approve: |
| 36 | |
| 37 | - registering hosted `get_section_source` |
| 38 | - adding `get_section_source` to hosted role ACLs |
| 39 | - adding Hub REST, OpenAPI, Hub UI, or canister routes |
| 40 | - adding search, vectors, indexes, persistence, sidecars, summaries, or memory events |
| 41 | - adding Scooling runtime behavior |
| 42 | - returning note body text |
| 43 | - returning section body text |
| 44 | - returning snippets or source excerpts |
| 45 | - returning full frontmatter |
| 46 | - returning line ranges, byte offsets, or section body lengths |
| 47 | - returning absolute paths, raw canister payloads, provider payloads, or MCP resource URIs |
| 48 | - calling PageIndex, OCR, LLMs, or external providers |
| 49 | - adding provider routing |
| 50 | |
| 51 | ## Future Hosted Tool |
| 52 | |
| 53 | A later runtime phase may register: |
| 54 | |
| 55 | ```text |
| 56 | get_section_source |
| 57 | ``` |
| 58 | |
| 59 | Registration must be guarded by: |
| 60 | |
| 61 | ```text |
| 62 | isToolAllowed('get_section_source', role) |
| 63 | ``` |
| 64 | |
| 65 | The tool must be exposed only after `mcp-tool-acl.mjs` explicitly approves it. |
| 66 | |
| 67 | ## Input Schema |
| 68 | |
| 69 | The future hosted tool may accept exactly: |
| 70 | |
| 71 | ```json |
| 72 | { |
| 73 | "path": "inbox/example.md" |
| 74 | } |
| 75 | ``` |
| 76 | |
| 77 | Field rules: |
| 78 | |
| 79 | - `path` is required. |
| 80 | - `path` must be a string. |
| 81 | - `path` must be non-empty after trimming. |
| 82 | - `path` must be vault-relative. |
| 83 | - `path` must not be POSIX absolute. |
| 84 | - `path` must not be Windows absolute. |
| 85 | - `path` must not contain traversal segments. |
| 86 | - `path` must be normalized to forward slashes before the canister read. |
| 87 | - No batch paths are accepted. |
| 88 | - No vault id, user id, role, body, snippet, search, filter, rank, provider, Scooling, |
| 89 | classroom, resource, persistence, line range, byte offset, or summary option is accepted. |
| 90 | |
| 91 | ## Hosted Role ACL Requirements |
| 92 | |
| 93 | The future runtime phase must add `get_section_source` to the hosted read-tool ACL only when |
| 94 | the implementation is added. |
| 95 | |
| 96 | The approved runtime ACL behavior is: |
| 97 | |
| 98 | - `viewer`, `editor`, `evaluator`, and `admin` may list and call the tool after the ACL entry |
| 99 | is added. |
| 100 | - Unknown roles inherit the existing hosted ACL fallback behavior and must not receive a |
| 101 | broader tool set than `viewer`. |
| 102 | - The server must not register the tool when `isToolAllowed('get_section_source', role)` |
| 103 | returns false. |
| 104 | - The tool must not be available through write-only, admin-only, prompt, resource, or Hub |
| 105 | route registration paths. |
| 106 | |
| 107 | Phase 1K does not add the ACL entry. |
| 108 | |
| 109 | ## Active Vault Boundary |
| 110 | |
| 111 | The future hosted tool must use only the active hosted vault from the MCP session context: |
| 112 | |
| 113 | ```text |
| 114 | ctx.vaultId |
| 115 | ``` |
| 116 | |
| 117 | Rules: |
| 118 | |
| 119 | - The client cannot supply a vault id. |
| 120 | - The request path is interpreted only inside `ctx.vaultId`. |
| 121 | - The canister read must send `X-Vault-Id: <ctx.vaultId>`. |
| 122 | - The output `path` must be the normalized request path, not a canister-supplied path. |
| 123 | - A canister response that contains another vault path, an absolute path, or a raw storage key |
| 124 | must not affect the returned path. |
| 125 | - Missing, unauthorized, and invalid responses must not reveal whether a note exists in any |
| 126 | other vault. |
| 127 | |
| 128 | ## Effective Canister User Boundary |
| 129 | |
| 130 | The future hosted tool must use the same effective canister user boundary as adjacent hosted |
| 131 | read tools: |
| 132 | |
| 133 | ```text |
| 134 | ctx.canisterUserId || ctx.userId |
| 135 | ``` |
| 136 | |
| 137 | Rules: |
| 138 | |
| 139 | - The client cannot supply a user id. |
| 140 | - The canister read must send `X-User-Id` with the effective canister user id. |
| 141 | - The implementation must not use the actor user id when a distinct effective canister user |
| 142 | id is present. |
| 143 | - The implementation must not mix SectionSource output across effective users. |
| 144 | - Errors must not reveal another user's path, note body, frontmatter, canister payload, or |
| 145 | authorization state. |
| 146 | |
| 147 | ## Canister Auth And Header Behavior |
| 148 | |
| 149 | The future hosted tool must perform the same canister note-read request shape as |
| 150 | `get_note_outline`, `get_document_tree`, and `get_metadata_facets`: |
| 151 | |
| 152 | ```text |
| 153 | GET {canisterUrl}/api/v1/notes/{encodeURIComponent(normalizedPath)} |
| 154 | ``` |
| 155 | |
| 156 | Headers: |
| 157 | |
| 158 | - `Authorization: Bearer <ctx.token>` |
| 159 | - `X-Vault-Id: <ctx.vaultId>` |
| 160 | - `X-User-Id: <effective canister user id>` |
| 161 | - `X-Gateway-Auth: <ctx.canisterAuthSecret>` when configured |
| 162 | - `Accept: application/json` |
| 163 | - `Content-Type: application/json` |
| 164 | |
| 165 | The future implementation must not forward section-specific options, provider options, |
| 166 | Scooling options, search filters, line ranges, byte offsets, or resource URIs upstream. |
| 167 | |
| 168 | ## One-Note Read Behavior |
| 169 | |
| 170 | The future hosted tool must read one note only. |
| 171 | |
| 172 | Allowed upstream behavior: |
| 173 | |
| 174 | - one canister `GET /api/v1/notes/{path}` after path validation succeeds |
| 175 | - in-memory derivation using the already accepted SectionSource builder |
| 176 | - no write to notes, sidecars, indexes, vectors, summaries, memory, canister state, or provider |
| 177 | state |
| 178 | |
| 179 | Blocked upstream behavior: |
| 180 | |
| 181 | - `GET /api/v1/notes` list scans |
| 182 | - Hub REST calls |
| 183 | - bridge search calls |
| 184 | - index, vector, PageIndex, OCR, LLM, provider, summary, memory, import, export, or write calls |
| 185 | - Scooling calls |
| 186 | - resource registration or resource reads for SectionSource content |
| 187 | |
| 188 | ## Path Normalization And Unsafe Path Rejection |
| 189 | |
| 190 | The future hosted implementation must reject unsafe paths before the upstream canister fetch. |
| 191 | |
| 192 | The normalization algorithm must: |
| 193 | |
| 194 | - require a string |
| 195 | - trim whitespace |
| 196 | - replace backslashes with `/` |
| 197 | - reject empty paths |
| 198 | - reject paths beginning with `/` |
| 199 | - reject Windows drive paths such as `C:/Users/name/private.md` |
| 200 | - split on `/` |
| 201 | - remove empty segments caused by duplicate slashes |
| 202 | - reject any `..` segment |
| 203 | - join safe segments with `/` |
| 204 | |
| 205 | Unsafe path errors must not echo the raw unsafe path. In particular, an invalid absolute path |
| 206 | must not return `/Users/...`, `C:/...`, `\\server`, or any private local path in the MCP error. |
| 207 | |
| 208 | ## Output Allowlist |
| 209 | |
| 210 | The future hosted tool may return only body-free `knowtation.section_source/v0` output: |
| 211 | |
| 212 | ```json |
| 213 | { |
| 214 | "schema": "knowtation.section_source/v0", |
| 215 | "path": "inbox/example.md", |
| 216 | "title": "Example", |
| 217 | "sections": [ |
| 218 | { |
| 219 | "section_id": "inbox-example-md:h1-example-0001", |
| 220 | "heading_id": "h1-example-0001", |
| 221 | "level": 1, |
| 222 | "heading_path": ["Example"], |
| 223 | "heading_text": "Example", |
| 224 | "child_section_ids": [], |
| 225 | "body_available": true, |
| 226 | "body_returned": false, |
| 227 | "snippet_returned": false |
| 228 | } |
| 229 | ], |
| 230 | "truncated": false |
| 231 | } |
| 232 | ``` |
| 233 | |
| 234 | Allowed top-level fields: |
| 235 | |
| 236 | - `schema` |
| 237 | - `path` |
| 238 | - `title` |
| 239 | - `sections` |
| 240 | - `truncated` |
| 241 | |
| 242 | Allowed section fields: |
| 243 | |
| 244 | - `section_id` |
| 245 | - `heading_id` |
| 246 | - `level` |
| 247 | - `heading_path` |
| 248 | - `heading_text` |
| 249 | - `child_section_ids` |
| 250 | - `body_available` |
| 251 | - `body_returned` |
| 252 | - `snippet_returned` |
| 253 | |
| 254 | Required constants: |
| 255 | |
| 256 | - `schema` must be exactly `knowtation.section_source/v0`. |
| 257 | - `body_returned` must be `false`. |
| 258 | - `snippet_returned` must be `false`. |
| 259 | |
| 260 | ## Explicitly Excluded Output |
| 261 | |
| 262 | The future hosted tool must not output: |
| 263 | |
| 264 | - note body text |
| 265 | - section body text |
| 266 | - snippets |
| 267 | - source excerpts |
| 268 | - full frontmatter |
| 269 | - line ranges |
| 270 | - byte offsets |
| 271 | - section body lengths |
| 272 | - absolute filesystem paths |
| 273 | - raw canister paths |
| 274 | - raw canister payloads |
| 275 | - provider payloads |
| 276 | - provider keys |
| 277 | - rendered HTML |
| 278 | - summaries |
| 279 | - vector scores |
| 280 | - search results |
| 281 | - persistence ids |
| 282 | - sidecar paths |
| 283 | - memory events |
| 284 | - MCP resource URIs |
| 285 | - PageIndex output |
| 286 | - OCR text |
| 287 | - media metadata |
| 288 | - Scooling adapter state |
| 289 | - classroom policy state |
| 290 | |
| 291 | ## Error Sanitization |
| 292 | |
| 293 | The future hosted tool must return hosted MCP JSON errors in the existing envelope: |
| 294 | |
| 295 | ```json |
| 296 | { |
| 297 | "error": "Invalid path", |
| 298 | "code": "UPSTREAM_ERROR" |
| 299 | } |
| 300 | ``` |
| 301 | |
| 302 | The result must set `isError: true`. |
| 303 | |
| 304 | Exact error rules: |
| 305 | |
| 306 | - Missing `path` and non-string `path` return `Invalid path` without echoing the received |
| 307 | value. |
| 308 | - Unsafe paths return `Invalid path` before any upstream fetch. |
| 309 | - Missing notes return a generic upstream status class such as `Upstream 404`. |
| 310 | - Unauthorized notes return a generic upstream status class such as `Upstream 401` or |
| 311 | `Upstream 403`. |
| 312 | - Upstream runtime failures return a generic upstream failure without raw upstream response |
| 313 | bodies. |
| 314 | - Invalid Markdown or malformed canister note JSON must not return note body text, |
| 315 | frontmatter, raw canister payloads, paths from the canister response, headers, tokens, or |
| 316 | provider payloads. |
| 317 | |
| 318 | Errors must not contain: |
| 319 | |
| 320 | - note body text |
| 321 | - section body text |
| 322 | - snippets |
| 323 | - full frontmatter |
| 324 | - heading paths beyond what was already authorized in a successful response |
| 325 | - absolute paths |
| 326 | - requested unsafe paths |
| 327 | - raw canister payloads |
| 328 | - canister auth secrets |
| 329 | - bearer tokens |
| 330 | - gateway secrets |
| 331 | - provider payloads |
| 332 | - MCP resource URIs |
| 333 | |
| 334 | ## Logging Exclusions |
| 335 | |
| 336 | The future hosted implementation must not log: |
| 337 | |
| 338 | - note body text |
| 339 | - section body text |
| 340 | - snippets |
| 341 | - full frontmatter |
| 342 | - heading text |
| 343 | - heading paths |
| 344 | - raw canister payloads |
| 345 | - requested unsafe paths |
| 346 | - absolute paths |
| 347 | - bearer tokens |
| 348 | - gateway secrets |
| 349 | - canister auth secrets |
| 350 | - provider payloads |
| 351 | - MCP resource URIs |
| 352 | |
| 353 | Bounded operational logs may include only: |
| 354 | |
| 355 | - tool name |
| 356 | - sanitized outcome class |
| 357 | - sanitized upstream status class |
| 358 | - elapsed time |
| 359 | - section count |
| 360 | - truncated flag |
| 361 | |
| 362 | ## Deletion, Export, And Staleness |
| 363 | |
| 364 | The future hosted tool is on-demand and non-persistent. |
| 365 | |
| 366 | Until a separate persistence spec is accepted: |
| 367 | |
| 368 | - no hosted SectionSource sidecar is created |
| 369 | - no hosted SectionSource index is created |
| 370 | - no vector record is created |
| 371 | - no memory event is created |
| 372 | - no summary record is created |
| 373 | - no provider record is created |
| 374 | - no Scooling record is created |
| 375 | - export behavior remains unchanged |
| 376 | - deleting a note leaves no SectionSource-derived hosted artifact to delete |
| 377 | - editing a note leaves no stale SectionSource-derived hosted artifact to invalidate |
| 378 | |
| 379 | If a later phase adds persistence, it must define delete, edit, export, backup, restore, |
| 380 | multi-vault isolation, stale-data invalidation, and retention behavior before implementation. |
| 381 | |
| 382 | ## Prompt-Injection Handling |
| 383 | |
| 384 | Hosted SectionSource text fields are private, untrusted source material: |
| 385 | |
| 386 | - `title` |
| 387 | - `heading_text` |
| 388 | - `heading_path` |
| 389 | - future labels, snippets, or section bodies if a later spec accepts them |
| 390 | |
| 391 | Prompt-like headings that ask a model to reveal secrets, bypass review, ignore policy, call |
| 392 | providers, exfiltrate learner data, alter grades, or disable guardrails must remain inert |
| 393 | text. They must not become tool instructions, system prompts, routing decisions, provider |
| 394 | requests, write-back approvals, or authorization overrides. |
| 395 | |
| 396 | ## Scooling Consumption Boundary |
| 397 | |
| 398 | This phase does not add Scooling runtime behavior. |
| 399 | |
| 400 | Future Scooling consumption may use hosted `get_section_source` only after: |
| 401 | |
| 402 | - the hosted runtime implementation is added and tested in Knowtation |
| 403 | - the hosted ACL explicitly exposes the tool |
| 404 | - Scooling calls through a Scooling-owned adapter |
| 405 | - Scooling preserves the body-free `knowtation.section_source/v0` allowlist |
| 406 | - Scooling treats heading text and heading paths as untrusted source material |
| 407 | |
| 408 | Scooling must not: |
| 409 | |
| 410 | - bypass Knowtation hosted authorization |
| 411 | - parse Markdown as the canonical section parser |
| 412 | - derive canonical section ids |
| 413 | - store SectionSource as truth |
| 414 | - call PageIndex, OCR, LLMs, or external providers to recreate sections |
| 415 | - expose private learner section metadata outside authorized contexts |
| 416 | - request note bodies, section bodies, snippets, resource URIs, provider payloads, line |
| 417 | ranges, byte offsets, or section body lengths through this tool |
| 418 | - use SectionSource reads as write-back approval |
| 419 | |
| 420 | ## Seven-Tier Test Requirements |
| 421 | |
| 422 | ### Unit |
| 423 | |
| 424 | - The implementation spec documents role ACL, active vault, effective canister user, canister |
| 425 | headers, one-note read, path safety, output allowlist, error, logging, lifecycle, |
| 426 | prompt-injection, and Scooling boundaries. |
| 427 | - The output allowlist matches body-free `SectionSource v0`. |
| 428 | - `body_returned` and `snippet_returned` remain false. |
| 429 | - Invalid path errors do not echo unsafe paths. |
| 430 | |
| 431 | ### Integration |
| 432 | |
| 433 | - Hosted runtime still does not register `get_section_source` in this planning phase. |
| 434 | - Hosted ACL still does not include `get_section_source` in this planning phase. |
| 435 | - Existing hosted `get_note_outline`, `get_document_tree`, and `get_metadata_facets` remain |
| 436 | the required implementation comparison points. |
| 437 | - Future runtime tests must prove the canister read uses `Authorization`, `X-Vault-Id`, |
| 438 | `X-User-Id`, and `X-Gateway-Auth` consistently with adjacent hosted read tools. |
| 439 | |
| 440 | ### End To End |
| 441 | |
| 442 | - A hosted MCP client cannot list `get_section_source` in this planning phase. |
| 443 | - A hosted MCP client cannot call `get_section_source` in this planning phase. |
| 444 | - Future runtime tests must prove a hosted MCP client can request one body-free SectionSource |
| 445 | response only after ACL and registration are added. |
| 446 | - No hosted MCP flow returns note bodies, section bodies, snippets, full frontmatter, |
| 447 | provider payloads, or resource URIs. |
| 448 | |
| 449 | ### Stress |
| 450 | |
| 451 | - Planning checks stay bounded to SectionSource docs, hosted gateway files, and contract |
| 452 | tests. |
| 453 | - Future runtime tests must prove large notes remain capped by heading and text caps. |
| 454 | - Future runtime tests must prove repeated calls for unchanged notes are deterministic. |
| 455 | - No test scans a real vault or calls external providers. |
| 456 | |
| 457 | ### Data Integrity |
| 458 | |
| 459 | - This planning phase writes no notes, sidecars, indexes, vectors, memory, summaries, |
| 460 | provider records, Scooling records, or canister state. |
| 461 | - Future runtime tests must prove one hosted SectionSource request performs one note read and |
| 462 | no writes. |
| 463 | - Export, delete, edit, backup, and restore behavior remain unchanged in this phase. |
| 464 | |
| 465 | ### Performance |
| 466 | |
| 467 | - The future hosted tool must read one note only. |
| 468 | - The future hosted tool must not scan the whole vault. |
| 469 | - The future hosted tool must not call bridge search. |
| 470 | - The future hosted tool must not call external providers. |
| 471 | - Output size must remain bounded by accepted SectionSource caps. |
| 472 | |
| 473 | ### Security |
| 474 | |
| 475 | - Hosted runtime exposure remains blocked in this phase. |
| 476 | - Hosted ACL exposure remains blocked in this phase. |
| 477 | - No note body text appears in hosted SectionSource output. |
| 478 | - No section body text appears in hosted SectionSource output. |
| 479 | - No snippets appear in hosted SectionSource output. |
| 480 | - No full frontmatter appears in hosted SectionSource output. |
| 481 | - No absolute filesystem paths appear in hosted SectionSource output or errors. |
| 482 | - No raw canister payload appears in hosted SectionSource output or errors. |
| 483 | - No provider payload appears in hosted SectionSource output or errors. |
| 484 | - No MCP resource URI appears for hosted SectionSource content. |
| 485 | - Hub, search, persistence, Scooling, PageIndex, OCR, LLM, and provider exposure remain |
| 486 | blocked. |
| 487 | |
| 488 | ## Contract Guards |
| 489 | |
| 490 | This planning phase must add tests proving: |
| 491 | |
| 492 | - this hosted implementation spec is complete |
| 493 | - hosted runtime still does not expose `get_section_source` |
| 494 | - hosted ACL still does not include `get_section_source` |
| 495 | - hosted tools/list still omits `get_section_source` |
| 496 | - no Hub, search, persistence, Scooling, body, snippet, provider, or resource surface is |
| 497 | added for SectionSource |
| 498 | |
| 499 | ## Stop Conditions |
| 500 | |
| 501 | Stop and re-plan if hosted work requires: |
| 502 | |
| 503 | - returning note body text |
| 504 | - returning section body text |
| 505 | - returning snippets |
| 506 | - returning full frontmatter |
| 507 | - returning exact line ranges |
| 508 | - returning byte offsets |
| 509 | - returning section body lengths |
| 510 | - returning absolute paths |
| 511 | - returning raw canister payloads |
| 512 | - returning provider payloads |
| 513 | - returning MCP resource URIs |
| 514 | - adding Hub REST, OpenAPI, Hub UI, or canister routes |
| 515 | - adding search, vectors, indexes, persistence, sidecars, summaries, or memory events |
| 516 | - adding Scooling runtime behavior |
| 517 | - calling PageIndex, OCR, LLMs, or external providers |
| 518 | - weakening hosted role ACL, active vault, effective canister user, or path safety behavior |
| 519 | - logging note content, section content, headings, raw upstream payloads, auth headers, |
| 520 | gateway secrets, bearer tokens, or provider payloads |
| 521 | |
| 522 | ## Acceptance Criteria |
| 523 | |
| 524 | Phase 1K is accepted when: |
| 525 | |
| 526 | - The hosted implementation behavior is specified before runtime exposure. |
| 527 | - The future tool is limited to one vault-relative note path. |
| 528 | - The future ACL behavior is read-only and role-gated. |
| 529 | - The future canister request uses the active vault and effective canister user boundaries. |
| 530 | - The future output is limited to body-free `knowtation.section_source/v0` metadata. |
| 531 | - Errors and logs are sanitized. |
| 532 | - Deletion, export, and staleness behavior remain non-persistent. |
| 533 | - Prompt-injection text remains untrusted source material. |
| 534 | - Scooling remains a downstream consumer behind its adapter boundary. |
| 535 | - Contract tests prove hosted runtime and ACL exposure remain absent in this planning phase. |
| 536 | - Contract tests prove no Hub, search, persistence, Scooling, body, snippet, provider, or |
| 537 | resource surface was added. |
| 538 | |
| 539 | ## Recommendation |
| 540 | |
| 541 | Phase 1K is the accepted planning and contract-test phase. |
| 542 | |
| 543 | Phase 1L implements the hosted MCP runtime that follows this spec. It adds hosted ACL |
| 544 | registration and hosted MCP runtime tests together. It does not add Hub REST, OpenAPI, Hub |
| 545 | UI, canister routes, search, persistence, Scooling runtime behavior, body reads, snippets, |
| 546 | summaries, PageIndex, OCR, LLM calls, provider routing, or write-back behavior. |