ARCHITECTURE.md markdown
149 lines 7.5 KB
Raw
sha256:8d46372e39d2d5a54fd93a8b1c27922fe0d9b22a72197345f1d2c71701cc4ce2 feat(auth): persistent login system + C7 session introspection Human minor ⚠ breaking 17 days ago

Knowtation — Architecture

Canonical spec: Data formats, CLI surface, and contracts are defined in docs/SPEC.md. The Whitepaper covers the product thesis in depth including an architecture diagram (§15). This file is the structural overview.


High-level system map

Sources (17 import source types + 4 capture channels)
  Markdown, PDF, DOCX, URL, ChatGPT, Claude, Mem0, Notion, Jira, Linear, NotebookLM,
  GDrive, MIF, Supabase, Audio (Whisper), Video (Whisper), Wallet CSV
  + file/stdin, HTTP webhook, Slack/Discord/Telegram adapters
          │
          ▼
  Vault (Markdown + YAML frontmatter) ← source of truth, editor-agnostic
          │
          ├── Index: chunk → embed → vector store (Qdrant or sqlite-vec)
          │
          ├── Memory: event log + semantic recall + consolidation (5 providers)
          │
          └── Trust pipeline: proposals → review → attestation → ICP canister
          │
          ▼
  Agent surface
    CLI    — 25+ commands (incl. doctor), JSON output, all filters and token levers
    MCP    — 33 tools, 24 resources (incl. knowtation://prime), 13 prompts (stdio or HTTP)
    Hub    — REST API + web UI (self-hosted or hosted at knowtation.store)

Deployment modes

Self-hosted

Clone the repo, npm install, configure config/local.yaml (vault path, embedding provider, vector backend), run npm run index and optionally npm run hub. The vault, index, and memory data stay on your machine. Full control; no external dependencies beyond your chosen embedding provider.

Hosted (knowtation.store)

Three services run on Netlify and the Internet Computer:

Service Technology Role
Gateway (hub/gateway/) Node.js / Netlify Functions OAuth (Google + GitHub), JWT auth, billing (Stripe), image proxy, MCP OAuth 2.1, rate limiting, request routing
Bridge (hub/bridge/) Node.js / Netlify Functions Vault operations, GitHub integration (backup/sync), team roles, import, memory consolidation
Canister (hub/icp/) Motoko / Internet Computer Vault note storage, attestation anchoring, admin functions, gateway-auth-gated API
Browser / Agent
      │  HTTPS
      ▼
  Gateway (Netlify)  ──JWT──▶  Bridge (Netlify)
      │                               │
      │  X-Gateway-Auth               │  X-Gateway-Auth
      ▼                               ▼
  ICP Canister                   GitHub API
  (rsovz-byaaa-aaaaa-qgira-cai)  (vault backup)

The gateway and bridge communicate with the ICP canister using an X-Gateway-Auth shared secret. The browser never talks to the canister directly; all canister access is proxied through the gateway.


Core components

Vault

  • Format: Markdown + YAML frontmatter. Editor-agnostic (Obsidian, SilverBullet, Foam, VS Code, or any text editor).
  • Layout: vault/inbox/, vault/captures/, vault/projects/<slug>/, vault/areas/, vault/archive/, vault/media/audio|video/, vault/templates/, vault/meta/.
  • Portability: The vault is a folder of files. Migrate by copying it. Version with Git for history and rollback.

Index

Chunks vault notes by heading or size, embeds them, and upserts into the vector store. Metadata includes path, project, tags, dates, entity, episode, and causal chain fields. Supports:

  • sqlite-vec — zero-server local SQLite file (default for self-hosted)
  • Qdrant — separate vector database for production deployments

Memory (5 providers)

Provider Storage Semantic search
file Append-only JSONL + state.json No
vector File + embeddings in vector store Yes
mem0 File + Mem0 REST API dual-write Yes
supabase File + pgvector table Yes
encrypted AES-256-GCM at rest (scrypt key) No

Fifteen event types; three-pass consolidation (consolidate / verify / discover); session summaries; retention enforcement; cross-vault or per-vault scope.

CLI

Primary interface. All commands output JSON with --json. Key subcommands: search, get-note, list-notes, write, export, import, memory, propose, capture, transcribe, index, daemon.

MCP Server

33 tools, 23 resources, 13 prompts over stdio or HTTP transports. Wraps the same backend as the CLI. Hosted MCP adds OAuth 2.1 and role-gated access (viewer / editor / admin). Configure with npm run mcp or npm run mcp:http.

Hub

Web UI and REST API. Features: Google/GitHub OAuth, proposals with LLM enrichment and rubric scoring, team roles (viewer/editor/admin/evaluator), invite-by-link, multi-vault, GitHub backup, image upload/proxy, Stripe billing, settings.

Attestation and ICP anchoring

AIR (Attestation Integrity Records) records intent before writes and exports. HMAC-signed records can be dual-written to the ICP attestation canister (dejku-syaaa-aaaaa-qgy3q-cai) for immutable, decentralized audit trails. Pending records are anchored in batch via POST /api/v1/attest/anchor-pending.

Billing (hosted)

Stripe-backed tiers (Free, Plus, Growth, Pro). Operations classified as: search, index, consolidation, note write, proposal write. Enforced when BILLING_ENFORCE=true; shadow mode logs usage without blocking. Token packs provide additional indexing capacity.


Security

The codebase completed a 4-phase pre-launch security audit (Phases 0–3, April 2026). Key controls:

  • X-Gateway-Auth shared secret gates all canister and bridge access
  • JWT expiry: 24h (gateway), 1h (self-hosted)
  • OAuth redirect token delivered via URL fragment (#token=), not query param
  • Short-lived HMAC-signed image proxy tokens (5 min TTL)
  • CORS locked to gateway origin on ICP canister when secret is set
  • Role-based access control on all bridge write routes

See docs/SECURITY-AUDIT-PLAN.md for the full remediation record.


Interface contracts

  • CLI → Agent: --json flag on all commands; error shape { "error": "...", "code": "..." }; exit codes 0/1/2
  • MCP: Tools mirror CLI semantics exactly; MCP is transport only
  • Hub REST API: JWT bearer auth; documented in docs/HUB-API.md
  • Capture plugins: Write Markdown to vault/inbox/ with source, date, source_id frontmatter; contract in docs/CAPTURE-CONTRACT.md
  • Vault format: Frontmatter schema in docs/SPEC.md

Key documentation

Document What it covers
docs/WHITEPAPER.md Product thesis, architecture diagram, full feature inventory
docs/SPEC.md Frontmatter, CLI commands, config, MCP, contracts
docs/POPULAR-PROMPTS-AND-STARTERS.md MCP prompt names, copy-paste starters for any LLM, CLI one-liners
docs/HUB-API.md Hub REST API and auth
docs/AGENT-ORCHESTRATION.md Multi-agent setup
docs/MEMORY-CONSOLIDATION-GUIDE.md Consolidation daemon
docs/IMPORT-SOURCES.md All 17 source_type importers, Hub bulk
docs/IMPORT-URL-AND-DOCUMENTS-PHASES.md URL/DOCX/PDF and Hub bulk roadmap and status
docs/SECURITY-AUDIT-PLAN.md Security audit phases and controls
File History 2 commits
sha256:8d46372e39d2d5a54fd93a8b1c27922fe0d9b22a72197345f1d2c71701cc4ce2 feat(auth): persistent login system + C7 session introspection Human minor 17 days ago