NOTE-OUTLINE-MVP-SPEC.md file-level

at sha256:3 · View file ↗ · Intel ↗

History
1 files
1 commits
0 hotspots
0 🧊 dead
0 💥 blast risk
sha256:6 feat(calendar): enforce agent context tiers in retrieval API (Phase 1E)… · aaronrene · Jun 18, 2026
1 # Note Outline MVP Spec
2
3 ## Simple Summary
4
5 This MVP adds a safe way to ask Knowtation for the headings inside one Markdown note.
6
7 The first version does not change search, indexing, memory, imports, PageIndex, Hub REST,
8 OpenAPI, vectors, summaries, or persistence. It only defines a read-only note outline
9 contract that supports Scooling adapter consumption and future tree-aware retrieval.
10
11 ## Technical Summary
12
13 `NoteOutline` is a derived, read-only view over a single Markdown note body. It is built
14 on demand from the current note content and returns a minimal JSON shape containing the
15 note path, display title, heading levels, heading text, deterministic heading IDs, and a
16 truncation flag.
17
18 The outline is treated as note-content-derived data. If a user cannot read the note body,
19 the user cannot read the note outline.
20
21 ## Goals
22
23 - Add a deterministic contract for reading one note's Markdown heading outline.
24 - Keep the first implementation local and bounded: parser first, then CLI, then MCP.
25 - Keep output small and safe for agent use.
26 - Give Schooling a stable adapter target without forcing Knowtation to ship tree search.
27 - Create a foundation for future section-aware retrieval without committing to storage,
28 vectors, PageIndex, or hosted document processing.
29
30 ## Non-Goals
31
32 - No PageIndex integration.
33 - No OCR.
34 - No PDF/DOCX outline extraction.
35 - No vector indexing changes.
36 - No search mode changes.
37 - No LLM summaries.
38 - No memory events.
39 - No daemon or discover-pass changes.
40 - No Hub REST endpoint.
41 - No OpenAPI route.
42 - No Hub UI.
43 - No MCP resources or resource listing.
44 - No canister storage change.
45 - No persisted sidecar files.
46 - No migration.
47 - No source snippets, body excerpts, or frontmatter in output.
48
49 ## Terminology
50
51 | Term | Meaning |
52 | --- | --- |
53 | `NoteOutline` | The read-only outline of one Markdown note's headings. |
54 | `DocumentOutline` | Reserved future term for imported documents that are not native notes. Not part of this MVP. |
55 | `VaultTree` | Reserved future term for folders/projects/notes across a vault. Not part of this MVP. |
56 | `SectionSearch` | Reserved future term for retrieval over note sections. Not part of this MVP. |
57 | `PageIndexProvider` | Reserved future provider name for optional external PageIndex processing. Not part of this MVP. |
58
59 Public phase 1 naming must use `note-outline` / `note_outline`, not `tree`, `page-index`,
60 or `document-tree`.
61
62 ## Phase Order
63
64 ### Phase 0: Spec
65
66 Create and review this document. No runtime behavior changes.
67
68 ### Phase 1A: Parser Only
69
70 Add a pure module and parser tests:
71
72 - `lib/note-outline.mjs`
73 - `test/note-outline.test.mjs`
74
75 No CLI, MCP, Hub, storage, search, or import wiring in this phase.
76
77 ### Phase 1B: CLI
78
79 Add:
80
81 ```text
82 knowtation get-note-outline <path> --json
83 ```
84
85 This command reads one vault-relative note and returns the `NoteOutline` JSON contract.
86
87 ### Phase 1C: Self-Hosted MCP
88
89 Add local MCP tool:
90
91 ```text
92 get_note_outline
93 ```
94
95 The tool mirrors CLI semantics and returns the same JSON shape.
96
97 ### Phase 1D: Hosted MCP
98
99 Add hosted MCP tool only after parser, CLI, local MCP, and security tests pass:
100
101 ```text
102 get_note_outline
103 ```
104
105 Hosted implementation reads the note through the same canister path and headers as
106 `get_note`, then derives the outline in the gateway session.
107
108 ## Implementation Status
109
110 Status as of 2026-05-24 on Muse `main`:
111
112 | Phase | Status | Muse commit | Verification |
113 | --- | --- | --- | --- |
114 | Phase 0: Spec | Complete | `sha256:f223a66c467b` | Spec committed before runtime changes. |
115 | Phase 1A: Parser Only | Complete | `sha256:b584f61cbf00` | Parser tests cover block-aware Markdown behavior, caps, data integrity, performance, and security output boundaries. |
116 | Phase 1B: CLI | Complete | `sha256:91f5cde8cca6` | `get-note-outline <path> --json` returns the `NoteOutline` contract without body text or full frontmatter. |
117 | Phase 1C: Self-Hosted MCP | Complete | `sha256:971609defff9` | Self-hosted `get_note_outline` mirrors CLI semantics and uses the same safe JSON contract. |
118 | Phase 1D: Hosted MCP | Complete | `sha256:9e4301d69902` | Hosted `get_note_outline` is viewer/read-level, uses the same canister read path as `get_note`, and has tests for missing/forbidden notes, no outline resource exposure, and unsafe upstream path leakage. |
119
120 Full local verification after Phase 1D passed with:
121
122 ```text
123 npm test
124 ```
125
126 The local `config/local.yaml` indentation issue was repaired outside Muse history so the
127 test suite could load local configuration. That private config repair is not part of the
128 feature commits.
129
130 This work has been merged into local Muse `main`. Remote staging push remains blocked by
131 the ongoing Muse authentication redevelopment, so local `main` is the current source of
132 truth for follow-on Knowtation work.
133
134 ## Deferred Phases
135
136 The following are explicitly deferred:
137
138 - `POST /api/v1/notes/outline`
139 - `docs/openapi.yaml` changes
140 - `knowtation://...` outline resources
141 - Hub UI display
142 - note section retrieval
143 - `DocumentTree` runtime implementation
144 - outline persistence
145 - vector payload fields
146 - PageIndex provider
147 - OCR provider
148 - section summaries
149 - line range exposure
150
151 The follow-on `DocumentTree v0` planning contract is documented separately in
152 `docs/DOCUMENT-TREE-V0-SPEC.md` and has since shipped through CLI, self-hosted MCP, and
153 hosted MCP read surfaces.
154
155 ## JSON Contract
156
157 ### Success Shape
158
159 ```json
160 {
161 "schema": "knowtation.note_outline/v1",
162 "path": "inbox/example.md",
163 "title": "Example",
164 "headings": [
165 {
166 "level": 1,
167 "text": "Introduction",
168 "id": "h1-introduction-0001"
169 }
170 ],
171 "truncated": false
172 }
173 ```
174
175 ### Field Rules
176
177 | Field | Type | Required | Rule |
178 | --- | --- | --- | --- |
179 | `schema` | string | Yes | Must be exactly `knowtation.note_outline/v1` for this MVP. |
180 | `path` | string | Yes | Vault-relative note path. Never absolute. |
181 | `title` | string or null | Yes | Display title from frontmatter or path-derived title. No full frontmatter object. |
182 | `headings` | array | Yes | Ordered list of heading records. Empty when the note has no headings. |
183 | `truncated` | boolean | Yes | True when caps prevent returning all headings. |
184
185 Heading record:
186
187 | Field | Type | Required | Rule |
188 | --- | --- | --- | --- |
189 | `level` | number | Yes | Markdown heading depth, 1 through 6. |
190 | `text` | string | Yes | Plain heading text after Markdown inline text extraction. |
191 | `id` | string | Yes | Deterministic, versioned-by-contract heading ID for this response. |
192
193 ### Explicitly Excluded Fields
194
195 The MVP response must not include:
196
197 - note body
198 - snippets
199 - source excerpts
200 - full frontmatter
201 - provider keys
202 - absolute filesystem paths
203 - raw HTML rendering
204 - byte offsets
205 - exact line ranges
206 - section body lengths
207 - LLM summaries
208 - vector scores
209 - memory events
210
211 ## Error Contract
212
213 CLI `--json` errors keep the existing shape:
214
215 ```json
216 { "error": "message", "code": "ERROR_CODE" }
217 ```
218
219 MCP errors keep the existing JSON text error pattern used by other MCP tools.
220
221 Hosted missing-note and unauthorized-note behavior must not reveal more information than
222 the existing hosted `get_note` path already reveals. If future role/scope behavior becomes
223 stricter than body reads, outline reads must follow the stricter rule.
224
225 ## Parser Decision
226
227 The parser must use a Markdown parser with block awareness and source positions. Regex-only
228 parsing is not acceptable for this feature.
229
230 Recommended dependency direction:
231
232 ```text
233 unified + remark-parse
234 ```
235
236 Reasons:
237
238 - Parses CommonMark into an mdast tree.
239 - Provides heading nodes rather than raw line matches.
240 - Avoids false headings inside fenced code blocks.
241 - Supports Setext headings.
242 - Provides position data if future local-only ranges are added.
243 - ESM-only packages align with this repository's `"type": "module"`.
244
245 Alternative:
246
247 ```text
248 micromark
249 ```
250
251 `micromark` is lower-level and precise, but requires more custom token handling. It should
252 be selected only if the implementation needs lower-level token control.
253
254 Before adding the dependency, run normal package-manager installation so `package.json`
255 and `package-lock.json` stay in sync. Do not hand-edit dependency versions.
256
257 ## Markdown Behavior
258
259 ### Must Support
260
261 - YAML frontmatter at the start of a note. Frontmatter is not outline content.
262 - ATX headings: `#` through `######`.
263 - Optional closing hashes: `## Title ##`.
264 - Setext headings:
265
266 ```markdown
267 Title
268 =====
269
270 Subtitle
271 --------
272 ```
273
274 - Duplicate headings.
275 - Empty heading text.
276 - Inline formatting inside headings.
277 - Links, images, code spans, escaped characters, and emphasis inside headings.
278 - CRLF and LF line endings.
279 - Notes with no headings.
280 - Empty notes.
281 - Large notes up to the configured cap.
282
283 ### Must Not Treat As Headings
284
285 - Heading-like text inside fenced code blocks.
286 - Heading-like text inside indented code blocks.
287 - Heading-like text inside raw HTML blocks unless the parser returns a normal Markdown
288 heading node.
289 - YAML frontmatter keys.
290
291 ### Explicitly Deferred Or Unsupported
292
293 - MDX/JSX heading semantics.
294 - Custom HTML heading extraction from `<h1>` / `<h2>` tags.
295 - Notebook-style cell metadata.
296 - PDF page headings.
297 - OCR-derived headings.
298 - Wikilink graph hierarchy.
299
300 ## Heading Text Normalization
301
302 Heading text must be plain text, not rendered HTML.
303
304 Rules:
305
306 - Strip Markdown formatting syntax through AST text extraction.
307 - Preserve visible text content.
308 - Normalize internal whitespace to a single space.
309 - Trim leading and trailing whitespace.
310 - Treat HTML or script-looking content as text, never executable markup.
311
312 Example:
313
314 ```markdown
315 ## **Bold** [Link](https://example.com) `code`
316 ```
317
318 Expected text:
319
320 ```text
321 Bold Link code
322 ```
323
324 ## Heading ID Contract
325
326 IDs are deterministic within one outline response and stable for the same path and same
327 heading sequence.
328
329 Recommended MVP format:
330
331 ```text
332 h<level>-<slug>-<ordinal>
333 ```
334
335 Example:
336
337 ```text
338 h2-install-0002
339 ```
340
341 Rules:
342
343 - `level` is the Markdown heading depth.
344 - `slug` is normalized from heading text using the same conservative slug discipline as
345 Knowtation project/tag slugs where practical.
346 - `ordinal` is the one-based heading occurrence index in document order, zero-padded to
347 four digits.
348 - Duplicate headings receive distinct ordinals.
349 - IDs are not persisted.
350 - IDs are not promised to survive heading reordering or major parser changes.
351
352 If future versions need stronger stability across edits, introduce a new schema version.
353
354 ## Caps And Truncation
355
356 The parser must cap work to prevent accidental expensive calls on huge imported notes.
357
358 Initial recommended caps:
359
360 - max input characters parsed: 1,000,000
361 - max headings returned: 500
362
363 If input exceeds the character cap, parse only if the parser behavior is still safe and
364 bounded. Otherwise return a runtime error with a clear message.
365
366 If headings exceed the heading cap:
367
368 - return only the first capped set in document order
369 - set `truncated: true`
370
371 Caps must be constants in the parser module and covered by tests.
372
373 ## Security Invariants
374
375 ### General
376
377 - Outline is note-content-derived data.
378 - A caller must be allowed to read the note before reading the outline.
379 - The response must never include the note body.
380 - The response must never include full frontmatter.
381 - The response must never include absolute paths.
382 - The response must never render heading Markdown into HTML.
383 - The response must never execute or trust content from headings.
384 - Logs must not include heading text, body text, secrets, or raw upstream responses.
385
386 ### Local CLI And Self-Hosted MCP
387
388 - Resolve paths with existing vault path safety helpers.
389 - Only read files under the configured vault root.
390 - Respect existing note read behavior.
391 - Do not read `.env`, `config/local.yaml`, `data/`, or any ignored/non-vault file.
392
393 ### Hosted MCP
394
395 - Use the same effective canister user as `get_note`.
396 - Use the active `X-Vault-Id`.
397 - Include gateway/canister auth headers exactly like existing hosted note reads.
398 - Do not expose outlines through `resources/list`.
399 - Do not add outline resource URIs in phase 1.
400 - Viewer can read outlines only for notes the viewer can already read.
401 - Editor does not get broader outline visibility than viewer.
402 - Admin follows existing admin note-read behavior.
403 - Evaluator behavior must be explicitly tested before enabling hosted outline access for
404 evaluator sessions.
405
406 ## Memory, Daemon, And Discover Interaction
407
408 This MVP does not write memory events.
409
410 Rationale:
411
412 - Memory records activity over time.
413 - NoteOutline is a derived view of current note content.
414 - Duplicating heading text into memory creates unnecessary leakage and stale-data risk.
415
416 Future phases may record coarse lifecycle events such as `note_outline_read` only after a
417 separate privacy review. That event is not part of this MVP.
418
419 ## Imports And PageIndex Interaction
420
421 This MVP does not change imports.
422
423 Existing imports can produce Markdown notes. The outline parser can read those notes after
424 import because they are normal vault content.
425
426 PageIndex remains deferred. When a future `PageIndexProvider` exists, it must normalize
427 provider output into a Knowtation-owned format. It must not become the source of truth.
428
429 Before any PageIndex provider ships, there must be a separate consent, retention, deletion,
430 audit, and provider-key spec.
431
432 ## Schooling Interaction
433
434 Schooling can use this as a future adapter target:
435
436 ```text
437 KnowtationVaultAdapter.getNoteOutline(path)
438 ```
439
440 Schooling must not parse Markdown itself as the source of truth. Schooling can display a
441 placeholder until Knowtation exposes the relevant surface.
442
443 ## Test Matrix
444
445 ### Unit
446
447 - Frontmatter ignored as outline content.
448 - ATX headings parse correctly.
449 - Setext headings parse correctly.
450 - Fenced code block headings are ignored.
451 - Indented code block headings are ignored.
452 - Duplicate headings receive deterministic distinct IDs.
453 - Inline heading formatting becomes plain text.
454 - Malicious HTML/script-like heading text stays plain text.
455 - Empty note returns an empty headings array.
456 - No-heading note returns an empty headings array.
457 - CRLF input is handled.
458 - Heading cap sets `truncated: true`.
459
460 ### Integration
461
462 - CLI reads a fixture vault note and returns valid JSON.
463 - CLI rejects missing paths.
464 - CLI rejects traversal paths.
465 - Self-hosted MCP tool returns the same JSON shape as CLI.
466
467 ### End To End
468
469 - Schooling-facing adapter tests can later use the CLI/MCP shape without note bodies.
470 - Not part of parser-only phase.
471
472 ### Stress
473
474 - Large Markdown note stays within parser time and memory budget.
475 - Many headings are capped deterministically.
476
477 ### Data Integrity
478
479 - Parser output does not mutate notes.
480 - Parser output does not write sidecars.
481 - Parser output does not change vectors, memory, or indexes.
482 - IDs are deterministic for repeated calls with identical input.
483
484 ### Performance
485
486 - Parser is linear or near-linear for normal Markdown fixtures.
487 - Huge-input cap prevents unbounded work.
488
489 ### Security
490
491 - No body text in output.
492 - No full frontmatter in output.
493 - No absolute path in output.
494 - Path traversal fails.
495 - Hosted outline uses the same vault/user headers as `get_note`.
496 - Hosted unauthorized and missing notes do not leak extra information beyond existing
497 note-read behavior.
498 - Hosted viewer cannot read outlines outside active vault/scope.
499 - Tool listing role tests include `get_note_outline` only when enabled for that role.
500
501 ## REST And Scooling Bridge Update
502
503 The separately reviewed REST slice adds `GET /api/v1/note-outline?path=...` for
504 self-hosted Hub and hosted gateway. The route is auth-gated, one-note bounded, and
505 returns only the existing `knowtation.note_outline/v1` JSON contract.
506
507 The Scooling smoke bridge adds `GET /scooling/note-outline/smoke?path=...`. It is
508 disabled by default, limited to local or staging smoke validation, owns the upstream
509 bearer token, rejects credentials supplied by Scooling, validates the upstream
510 body-free payload, and returns only the raw `NoteOutline` JSON Scooling can validate.
511
512 This REST/bridge slice does not add note body output, snippets, full frontmatter,
513 absolute paths, MCP resources, search, vectors, PageIndex, OCR, persistence, summaries,
514 or write-back.
515
516 ## Files To Modify By Phase
517
518 ### Phase 1A
519
520 - `package.json`
521 - `package-lock.json`
522 - `lib/note-outline.mjs`
523 - `test/note-outline.test.mjs`
524
525 ### Phase 1B
526
527 - `cli/index.mjs`
528 - `test/cli.test.mjs`
529 - `docs/SPEC.md`
530 - `docs/CLI-JSON-SCHEMA.md`
531 - `docs/RETRIEVAL-AND-CLI-REFERENCE.md`
532
533 ### Phase 1C
534
535 - `mcp/create-server.mjs`
536 - local MCP tests
537 - `docs/AGENT-INTEGRATION.md`
538
539 ### Phase 1D
540
541 - `hub/gateway/mcp-hosted-server.mjs`
542 - `hub/gateway/mcp-tool-acl.mjs`
543 - `test/mcp-hosted-tools-list.test.mjs`
544 - hosted MCP security/parity tests
545 - `docs/PARITY-MATRIX-HOSTED.md`
546
547 ## Stop Conditions
548
549 Stop and re-plan if any of the following become necessary:
550
551 - returning note body text
552 - returning line ranges in hosted output
553 - broadening the REST route beyond one authorized path
554 - accepting credentials from Scooling
555 - returning a transport envelope that differs from the raw `NoteOutline` JSON contract
556 - changing search/index/vector behavior
557 - adding persistence
558 - adding PageIndex
559 - adding OCR
560 - adding LLM summaries
561 - changing canister storage
562 - weakening hosted scope behavior
563 - exposing outline resources through MCP resource listings
564
565 ## Acceptance Criteria
566
567 The MVP is acceptable only when:
568
569 - The spec is reviewed and accepted.
570 - Parser tests are written before parser implementation.
571 - Parser uses a block-aware Markdown parser.
572 - CLI and MCP surfaces return the same JSON contract.
573 - Hosted MCP access is gated exactly like note body reads.
574 - No runtime feature writes derived outline data.
575 - No output includes body text, full frontmatter, absolute paths, or secrets.
576 - Seven-tier tests are present for shipped phases.
577
578 ## Recommendation
579
580 Completed MVP implementation sequence:
581
582 1. Spec review.
583 2. Parser tests.
584 3. Parser module.
585 4. CLI command.
586 5. Self-hosted MCP tool.
587 6. Hosted MCP tool after security tests pass.
588 7. Auth-gated REST route and disabled-by-default Scooling smoke bridge after the REST
589 safety review passes.
590
591 Next, continue Knowtation development from local Muse `main` while remote staging
592 authentication is unavailable.
593
594 Do not begin PageIndex, section search, summaries, persistence, or broader REST expansion
595 as a bundled follow-on. Each of those needs a separate review pass, explicit scope, and
596 tests before implementation.