NOTE-OUTLINE-MVP-SPEC.md markdown
596 lines 18.0 KB
Raw
sha256:65ccb454656ea5acdea0a10e559b78bcde1eb6ff753ecc2911bc99d1c3d7cadd feat(calendar): enforce agent context tiers in retrieval AP… Human minor ⚠ breaking 2 days ago

Note Outline MVP Spec

Simple Summary

This MVP adds a safe way to ask Knowtation for the headings inside one Markdown note.

The first version does not change search, indexing, memory, imports, PageIndex, Hub REST, OpenAPI, vectors, summaries, or persistence. It only defines a read-only note outline contract that supports Scooling adapter consumption and future tree-aware retrieval.

Technical Summary

NoteOutline is a derived, read-only view over a single Markdown note body. It is built on demand from the current note content and returns a minimal JSON shape containing the note path, display title, heading levels, heading text, deterministic heading IDs, and a truncation flag.

The outline is treated as note-content-derived data. If a user cannot read the note body, the user cannot read the note outline.

Goals

  • Add a deterministic contract for reading one note's Markdown heading outline.
  • Keep the first implementation local and bounded: parser first, then CLI, then MCP.
  • Keep output small and safe for agent use.
  • Give Schooling a stable adapter target without forcing Knowtation to ship tree search.
  • Create a foundation for future section-aware retrieval without committing to storage, vectors, PageIndex, or hosted document processing.

Non-Goals

  • No PageIndex integration.
  • No OCR.
  • No PDF/DOCX outline extraction.
  • No vector indexing changes.
  • No search mode changes.
  • No LLM summaries.
  • No memory events.
  • No daemon or discover-pass changes.
  • No Hub REST endpoint.
  • No OpenAPI route.
  • No Hub UI.
  • No MCP resources or resource listing.
  • No canister storage change.
  • No persisted sidecar files.
  • No migration.
  • No source snippets, body excerpts, or frontmatter in output.

Terminology

Term Meaning
NoteOutline The read-only outline of one Markdown note's headings.
DocumentOutline Reserved future term for imported documents that are not native notes. Not part of this MVP.
VaultTree Reserved future term for folders/projects/notes across a vault. Not part of this MVP.
SectionSearch Reserved future term for retrieval over note sections. Not part of this MVP.
PageIndexProvider Reserved future provider name for optional external PageIndex processing. Not part of this MVP.

Public phase 1 naming must use note-outline / note_outline, not tree, page-index, or document-tree.

Phase Order

Phase 0: Spec

Create and review this document. No runtime behavior changes.

Phase 1A: Parser Only

Add a pure module and parser tests:

  • lib/note-outline.mjs
  • test/note-outline.test.mjs

No CLI, MCP, Hub, storage, search, or import wiring in this phase.

Phase 1B: CLI

Add:

knowtation get-note-outline <path> --json

This command reads one vault-relative note and returns the NoteOutline JSON contract.

Phase 1C: Self-Hosted MCP

Add local MCP tool:

get_note_outline

The tool mirrors CLI semantics and returns the same JSON shape.

Phase 1D: Hosted MCP

Add hosted MCP tool only after parser, CLI, local MCP, and security tests pass:

get_note_outline

Hosted implementation reads the note through the same canister path and headers as get_note, then derives the outline in the gateway session.

Implementation Status

Status as of 2026-05-24 on Muse main:

Phase Status Muse commit Verification
Phase 0: Spec Complete sha256:f223a66c467b Spec committed before runtime changes.
Phase 1A: Parser Only Complete sha256:b584f61cbf00 Parser tests cover block-aware Markdown behavior, caps, data integrity, performance, and security output boundaries.
Phase 1B: CLI Complete sha256:91f5cde8cca6 get-note-outline <path> --json returns the NoteOutline contract without body text or full frontmatter.
Phase 1C: Self-Hosted MCP Complete sha256:971609defff9 Self-hosted get_note_outline mirrors CLI semantics and uses the same safe JSON contract.
Phase 1D: Hosted MCP Complete sha256:9e4301d69902 Hosted get_note_outline is viewer/read-level, uses the same canister read path as get_note, and has tests for missing/forbidden notes, no outline resource exposure, and unsafe upstream path leakage.

Full local verification after Phase 1D passed with:

npm test

The local config/local.yaml indentation issue was repaired outside Muse history so the test suite could load local configuration. That private config repair is not part of the feature commits.

This work has been merged into local Muse main. Remote staging push remains blocked by the ongoing Muse authentication redevelopment, so local main is the current source of truth for follow-on Knowtation work.

Deferred Phases

The following are explicitly deferred:

  • POST /api/v1/notes/outline
  • docs/openapi.yaml changes
  • knowtation://... outline resources
  • Hub UI display
  • note section retrieval
  • DocumentTree runtime implementation
  • outline persistence
  • vector payload fields
  • PageIndex provider
  • OCR provider
  • section summaries
  • line range exposure

The follow-on DocumentTree v0 planning contract is documented separately in docs/DOCUMENT-TREE-V0-SPEC.md and has since shipped through CLI, self-hosted MCP, and hosted MCP read surfaces.

JSON Contract

Success Shape

{
  "schema": "knowtation.note_outline/v1",
  "path": "inbox/example.md",
  "title": "Example",
  "headings": [
    {
      "level": 1,
      "text": "Introduction",
      "id": "h1-introduction-0001"
    }
  ],
  "truncated": false
}

Field Rules

Field Type Required Rule
schema string Yes Must be exactly knowtation.note_outline/v1 for this MVP.
path string Yes Vault-relative note path. Never absolute.
title string or null Yes Display title from frontmatter or path-derived title. No full frontmatter object.
headings array Yes Ordered list of heading records. Empty when the note has no headings.
truncated boolean Yes True when caps prevent returning all headings.

Heading record:

Field Type Required Rule
level number Yes Markdown heading depth, 1 through 6.
text string Yes Plain heading text after Markdown inline text extraction.
id string Yes Deterministic, versioned-by-contract heading ID for this response.

Explicitly Excluded Fields

The MVP response must not include:

  • note body
  • snippets
  • source excerpts
  • full frontmatter
  • provider keys
  • absolute filesystem paths
  • raw HTML rendering
  • byte offsets
  • exact line ranges
  • section body lengths
  • LLM summaries
  • vector scores
  • memory events

Error Contract

CLI --json errors keep the existing shape:

{ "error": "message", "code": "ERROR_CODE" }

MCP errors keep the existing JSON text error pattern used by other MCP tools.

Hosted missing-note and unauthorized-note behavior must not reveal more information than the existing hosted get_note path already reveals. If future role/scope behavior becomes stricter than body reads, outline reads must follow the stricter rule.

Parser Decision

The parser must use a Markdown parser with block awareness and source positions. Regex-only parsing is not acceptable for this feature.

Recommended dependency direction:

unified + remark-parse

Reasons:

  • Parses CommonMark into an mdast tree.
  • Provides heading nodes rather than raw line matches.
  • Avoids false headings inside fenced code blocks.
  • Supports Setext headings.
  • Provides position data if future local-only ranges are added.
  • ESM-only packages align with this repository's "type": "module".

Alternative:

micromark

micromark is lower-level and precise, but requires more custom token handling. It should be selected only if the implementation needs lower-level token control.

Before adding the dependency, run normal package-manager installation so package.json and package-lock.json stay in sync. Do not hand-edit dependency versions.

Markdown Behavior

Must Support

  • YAML frontmatter at the start of a note. Frontmatter is not outline content.
  • ATX headings: # through ######.
  • Optional closing hashes: ## Title ##.
  • Setext headings:
Title
=====

Subtitle
--------
  • Duplicate headings.
  • Empty heading text.
  • Inline formatting inside headings.
  • Links, images, code spans, escaped characters, and emphasis inside headings.
  • CRLF and LF line endings.
  • Notes with no headings.
  • Empty notes.
  • Large notes up to the configured cap.

Must Not Treat As Headings

  • Heading-like text inside fenced code blocks.
  • Heading-like text inside indented code blocks.
  • Heading-like text inside raw HTML blocks unless the parser returns a normal Markdown heading node.
  • YAML frontmatter keys.

Explicitly Deferred Or Unsupported

  • MDX/JSX heading semantics.
  • Custom HTML heading extraction from <h1> / <h2> tags.
  • Notebook-style cell metadata.
  • PDF page headings.
  • OCR-derived headings.
  • Wikilink graph hierarchy.

Heading Text Normalization

Heading text must be plain text, not rendered HTML.

Rules:

  • Strip Markdown formatting syntax through AST text extraction.
  • Preserve visible text content.
  • Normalize internal whitespace to a single space.
  • Trim leading and trailing whitespace.
  • Treat HTML or script-looking content as text, never executable markup.

Example:

## **Bold** [Link](https://example.com) `code`

Expected text:

Bold Link code

Heading ID Contract

IDs are deterministic within one outline response and stable for the same path and same heading sequence.

Recommended MVP format:

h<level>-<slug>-<ordinal>

Example:

h2-install-0002

Rules:

  • level is the Markdown heading depth.
  • slug is normalized from heading text using the same conservative slug discipline as Knowtation project/tag slugs where practical.
  • ordinal is the one-based heading occurrence index in document order, zero-padded to four digits.
  • Duplicate headings receive distinct ordinals.
  • IDs are not persisted.
  • IDs are not promised to survive heading reordering or major parser changes.

If future versions need stronger stability across edits, introduce a new schema version.

Caps And Truncation

The parser must cap work to prevent accidental expensive calls on huge imported notes.

Initial recommended caps:

  • max input characters parsed: 1,000,000
  • max headings returned: 500

If input exceeds the character cap, parse only if the parser behavior is still safe and bounded. Otherwise return a runtime error with a clear message.

If headings exceed the heading cap:

  • return only the first capped set in document order
  • set truncated: true

Caps must be constants in the parser module and covered by tests.

Security Invariants

General

  • Outline is note-content-derived data.
  • A caller must be allowed to read the note before reading the outline.
  • The response must never include the note body.
  • The response must never include full frontmatter.
  • The response must never include absolute paths.
  • The response must never render heading Markdown into HTML.
  • The response must never execute or trust content from headings.
  • Logs must not include heading text, body text, secrets, or raw upstream responses.

Local CLI And Self-Hosted MCP

  • Resolve paths with existing vault path safety helpers.
  • Only read files under the configured vault root.
  • Respect existing note read behavior.
  • Do not read .env, config/local.yaml, data/, or any ignored/non-vault file.

Hosted MCP

  • Use the same effective canister user as get_note.
  • Use the active X-Vault-Id.
  • Include gateway/canister auth headers exactly like existing hosted note reads.
  • Do not expose outlines through resources/list.
  • Do not add outline resource URIs in phase 1.
  • Viewer can read outlines only for notes the viewer can already read.
  • Editor does not get broader outline visibility than viewer.
  • Admin follows existing admin note-read behavior.
  • Evaluator behavior must be explicitly tested before enabling hosted outline access for evaluator sessions.

Memory, Daemon, And Discover Interaction

This MVP does not write memory events.

Rationale:

  • Memory records activity over time.
  • NoteOutline is a derived view of current note content.
  • Duplicating heading text into memory creates unnecessary leakage and stale-data risk.

Future phases may record coarse lifecycle events such as note_outline_read only after a separate privacy review. That event is not part of this MVP.

Imports And PageIndex Interaction

This MVP does not change imports.

Existing imports can produce Markdown notes. The outline parser can read those notes after import because they are normal vault content.

PageIndex remains deferred. When a future PageIndexProvider exists, it must normalize provider output into a Knowtation-owned format. It must not become the source of truth.

Before any PageIndex provider ships, there must be a separate consent, retention, deletion, audit, and provider-key spec.

Schooling Interaction

Schooling can use this as a future adapter target:

KnowtationVaultAdapter.getNoteOutline(path)

Schooling must not parse Markdown itself as the source of truth. Schooling can display a placeholder until Knowtation exposes the relevant surface.

Test Matrix

Unit

  • Frontmatter ignored as outline content.
  • ATX headings parse correctly.
  • Setext headings parse correctly.
  • Fenced code block headings are ignored.
  • Indented code block headings are ignored.
  • Duplicate headings receive deterministic distinct IDs.
  • Inline heading formatting becomes plain text.
  • Malicious HTML/script-like heading text stays plain text.
  • Empty note returns an empty headings array.
  • No-heading note returns an empty headings array.
  • CRLF input is handled.
  • Heading cap sets truncated: true.

Integration

  • CLI reads a fixture vault note and returns valid JSON.
  • CLI rejects missing paths.
  • CLI rejects traversal paths.
  • Self-hosted MCP tool returns the same JSON shape as CLI.

End To End

  • Schooling-facing adapter tests can later use the CLI/MCP shape without note bodies.
  • Not part of parser-only phase.

Stress

  • Large Markdown note stays within parser time and memory budget.
  • Many headings are capped deterministically.

Data Integrity

  • Parser output does not mutate notes.
  • Parser output does not write sidecars.
  • Parser output does not change vectors, memory, or indexes.
  • IDs are deterministic for repeated calls with identical input.

Performance

  • Parser is linear or near-linear for normal Markdown fixtures.
  • Huge-input cap prevents unbounded work.

Security

  • No body text in output.
  • No full frontmatter in output.
  • No absolute path in output.
  • Path traversal fails.
  • Hosted outline uses the same vault/user headers as get_note.
  • Hosted unauthorized and missing notes do not leak extra information beyond existing note-read behavior.
  • Hosted viewer cannot read outlines outside active vault/scope.
  • Tool listing role tests include get_note_outline only when enabled for that role.

REST And Scooling Bridge Update

The separately reviewed REST slice adds GET /api/v1/note-outline?path=... for self-hosted Hub and hosted gateway. The route is auth-gated, one-note bounded, and returns only the existing knowtation.note_outline/v1 JSON contract.

The Scooling smoke bridge adds GET /scooling/note-outline/smoke?path=.... It is disabled by default, limited to local or staging smoke validation, owns the upstream bearer token, rejects credentials supplied by Scooling, validates the upstream body-free payload, and returns only the raw NoteOutline JSON Scooling can validate.

This REST/bridge slice does not add note body output, snippets, full frontmatter, absolute paths, MCP resources, search, vectors, PageIndex, OCR, persistence, summaries, or write-back.

Files To Modify By Phase

Phase 1A

  • package.json
  • package-lock.json
  • lib/note-outline.mjs
  • test/note-outline.test.mjs

Phase 1B

  • cli/index.mjs
  • test/cli.test.mjs
  • docs/SPEC.md
  • docs/CLI-JSON-SCHEMA.md
  • docs/RETRIEVAL-AND-CLI-REFERENCE.md

Phase 1C

  • mcp/create-server.mjs
  • local MCP tests
  • docs/AGENT-INTEGRATION.md

Phase 1D

  • hub/gateway/mcp-hosted-server.mjs
  • hub/gateway/mcp-tool-acl.mjs
  • test/mcp-hosted-tools-list.test.mjs
  • hosted MCP security/parity tests
  • docs/PARITY-MATRIX-HOSTED.md

Stop Conditions

Stop and re-plan if any of the following become necessary:

  • returning note body text
  • returning line ranges in hosted output
  • broadening the REST route beyond one authorized path
  • accepting credentials from Scooling
  • returning a transport envelope that differs from the raw NoteOutline JSON contract
  • changing search/index/vector behavior
  • adding persistence
  • adding PageIndex
  • adding OCR
  • adding LLM summaries
  • changing canister storage
  • weakening hosted scope behavior
  • exposing outline resources through MCP resource listings

Acceptance Criteria

The MVP is acceptable only when:

  • The spec is reviewed and accepted.
  • Parser tests are written before parser implementation.
  • Parser uses a block-aware Markdown parser.
  • CLI and MCP surfaces return the same JSON contract.
  • Hosted MCP access is gated exactly like note body reads.
  • No runtime feature writes derived outline data.
  • No output includes body text, full frontmatter, absolute paths, or secrets.
  • Seven-tier tests are present for shipped phases.

Recommendation

Completed MVP implementation sequence:

  1. Spec review.
  2. Parser tests.
  3. Parser module.
  4. CLI command.
  5. Self-hosted MCP tool.
  6. Hosted MCP tool after security tests pass.
  7. Auth-gated REST route and disabled-by-default Scooling smoke bridge after the REST safety review passes.

Next, continue Knowtation development from local Muse main while remote staging authentication is unavailable.

Do not begin PageIndex, section search, summaries, persistence, or broader REST expansion as a bundled follow-on. Each of those needs a separate review pass, explicit scope, and tests before implementation.

File History 2 commits
sha256:65ccb454656ea5acdea0a10e559b78bcde1eb6ff753ecc2911bc99d1c3d7cadd feat(calendar): enforce agent context tiers in retrieval AP… Human minor 2 days ago
sha256:9103f98c89257ed2b01c237cea895dabb3e85ea337dccb1161c175e4422355b6 docs: accept Calendar Events v0 spec with Phase 0 security … Human 2 days ago