Background Memory Consolidation Daemon — Full Spec
Overview
An opt-in, long-running background process that uses a user-provided LLM to autonomously maintain, consolidate, and improve the Knowtation memory layer while the user is idle. The daemon reads from the memory event log, identifies patterns and contradictions, merges redundant observations into concise facts, verifies stale references, and regenerates the memory pointer index — all without human intervention.
The daemon is architecturally independent of the memory improvements plan (pointer index, topic partitioning, skeptical memory, strict write discipline) but benefits from all of them. It can be implemented before or after those improvements, though topic partitioning and the pointer index make its output significantly more useful.
Motivation
Memory event logs grow linearly. After weeks of use, a vault may accumulate thousands of events — many redundant ("searched for X" five times), some contradictory (a note path was renamed but old events still reference the original path), and some stale (referencing notes that were deleted). Without consolidation, agents either load too much context (expensive, slow) or miss important patterns buried in noise.
The daemon solves this by running periodic "maintenance passes" — similar in concept to database vacuuming or garbage collection, but with the added intelligence of an LLM to reason about semantic relationships, contradictions, and what matters.
Architecture
┌──────────────────────────────────────────────────────┐
│ User's Machine │
│ │
│ ┌─────────────┐ ┌─────────────────────────────┐ │
│ │ Knowtation │ │ Daemon Process (separate) │ │
│ │ CLI / MCP │ │ │ │
│ │ │ │ ┌─────────┐ ┌────────────┐ │ │
│ │ writes to │───▶│ │ Watcher │ │ Scheduler │ │ │
│ │ events.jsonl │ └────┬────┘ └─────┬──────┘ │ │
│ │ │ │ │ │ │ │
│ └─────────────┘ │ ┌────▼──────────────▼────┐ │ │
│ │ │ Consolidation Engine │ │ │
│ │ │ │ │ │
│ │ │ 1. Read recent events │ │ │
│ │ │ 2. Group by topic │ │ │
│ │ │ 3. LLM: merge/dedup │ │ │
│ │ │ 4. Verify stale refs │ │ │
│ │ │ 5. Write consolidation │ │ │
│ │ │ 6. Rebuild index │ │ │
│ │ └────────────┬────────────┘ │ │
│ │ │ │ │
│ │ ┌────────────▼────────────┐ │ │
│ │ │ User-Provided LLM │ │ │
│ │ │ (OpenAI / Anthropic / │ │ │
│ │ │ Ollama / OpenRouter / │ │ │
│ │ │ any OpenAI-compat API) │ │ │
│ │ └─────────────────────────┘ │ │
│ └───────────────────────────────┘ │
└──────────────────────────────────────────────────────┘
The daemon runs as a separate Node.js process — not inside the CLI or MCP server. This isolation ensures:
- The main agent's context window is never polluted by maintenance work.
- A crash in the daemon does not affect the user's active session.
- The daemon can be started, stopped, and configured independently.
What the Daemon Does (Consolidation Passes)
Each pass performs these steps in order:
Pass 1: Event Consolidation
- Read events since the last consolidation (or last 24h if first run).
- Group events by topic (if topic partitioning is enabled) or by event type.
- Send each group to the LLM with a structured prompt:
- "Here are N events about topic X. Merge redundant observations. Remove contradictions (keep the most recent). Distill into 3-5 concise factual statements."
- Store the result as a
consolidationevent with the distilled facts. - Update the pointer index to reference the consolidation.
Pass 2: Stale Reference Detection
- Scan recent events for note path references.
- Check each path against the vault filesystem — does the file still exist? Has it been modified since the event?
- Mark stale events with
confidence: "stale"(if skeptical memory is enabled). - Optionally store a
maintenanceevent noting which references are stale.
Pass 3: Relationship Discovery
- Read the current memory index and recent consolidation events.
- Send to LLM: "Given these topic summaries, what connections or patterns do you see across topics? Are there contradictions between topics? What questions remain unanswered?"
- Store discovered relationships as
insightevents. - Update the pointer index with cross-references.
Pass 4: Index Rebuild
- Regenerate
memory-index.mdfrom the current state. - Rebuild topic files if topic partitioning is enabled.
- Prune events beyond retention window.
LLM Configuration
The daemon reuses Knowtation's existing completeChat infrastructure (lib/llm-complete.mjs) which already supports OpenAI, Anthropic, and Ollama. Users bring their own API key / local model.
A new optional daemon.llm config section allows users to specify a different (typically cheaper) model for background work, separate from the model used for interactive features like session summaries.
Config Schema
# config/local.yaml — daemon section
daemon:
enabled: false # opt-in; daemon does nothing unless true
# --- Scheduling ---
interval_minutes: 120 # run consolidation every N minutes (default: 2 hours)
idle_only: true # only run when no CLI/MCP activity detected in last N minutes
idle_threshold_minutes: 15 # minutes of inactivity before daemon considers user "idle"
run_on_start: false # run one pass immediately when daemon starts
# --- Consolidation scope ---
lookback_hours: 24 # how far back to read events per pass (default: 24h)
max_events_per_pass: 200 # cap events processed per pass (cost guard)
max_topics_per_pass: 10 # cap topic groups sent to LLM per pass
# --- Passes to enable ---
passes:
consolidate: true # merge/dedup events (Pass 1)
verify: true # stale reference detection (Pass 2)
discover: false # relationship discovery (Pass 3) — more expensive, opt-in
rebuild_index: true # always rebuild index after other passes (Pass 4)
# --- LLM (optional override; falls back to main LLM config / env) ---
llm:
provider: null # openai | anthropic | ollama | openrouter | null (auto-detect from env)
model: null # e.g. "gpt-4o-mini", "claude-3-5-haiku-20241022", "llama3.2"
api_key_env: null # env var name for API key (e.g. "DAEMON_OPENAI_KEY"); null = use main key
base_url: null # for OpenAI-compatible APIs (OpenRouter, local proxies, vLLM, etc.)
max_tokens: 1024 # per LLM call
temperature: 0.2 # low temperature for factual consolidation
# --- Safety ---
dry_run: false # if true, log what would happen but don't write events
log_file: null # path to daemon log; null = data_dir/daemon.log
max_cost_per_day_usd: null # optional daily cost cap (requires token counting)
Environment Variable Overrides
| Env Var | Overrides |
|---|---|
KNOWTATION_DAEMON_ENABLED |
daemon.enabled |
KNOWTATION_DAEMON_INTERVAL |
daemon.interval_minutes |
KNOWTATION_DAEMON_LLM_PROVIDER |
daemon.llm.provider |
KNOWTATION_DAEMON_LLM_MODEL |
daemon.llm.model |
KNOWTATION_DAEMON_LLM_BASE_URL |
daemon.llm.base_url |
KNOWTATION_DAEMON_DRY_RUN |
daemon.dry_run |
CLI Interface
# Start the daemon (foreground, Ctrl+C to stop)
knowtation daemon start
# Start as a background process (writes PID to data_dir/daemon.pid)
knowtation daemon start --background
# Stop a running daemon
knowtation daemon stop
# Check daemon status (running, last pass time, events processed)
knowtation daemon status
# Run a single consolidation pass (no daemon loop)
knowtation memory consolidate
knowtation memory consolidate --dry-run
knowtation memory consolidate --passes consolidate,verify
# Show daemon log
knowtation daemon log
knowtation daemon log --tail 50
MCP Tools
| Tool | Description |
|---|---|
memory_consolidate |
Trigger a single consolidation pass. Params: dry_run, passes, lookback_hours. |
daemon_status |
Return daemon status (running, last pass, next scheduled, events processed). |
New Event Types
| Type | Trigger | Shape |
|---|---|---|
consolidation |
Daemon Pass 1 | { topic, facts: string[], event_count, since, until } |
maintenance |
Daemon Pass 2 | { stale_paths: string[], verified_paths: string[], checked_count } |
insight |
Daemon Pass 3 | { connections: string[], contradictions: string[], open_questions: string[] } |
These are added to MEMORY_EVENT_TYPES in lib/memory-event.mjs.
LLM Prompts
Consolidation Prompt (Pass 1)
System: You are a memory consolidation engine for a personal knowledge vault.
You receive a batch of timestamped activity events on a single topic.
Your job:
1. Merge redundant observations into single factual statements.
2. When events contradict each other, keep the most recent fact and discard the older one.
3. Distill the batch into 3-7 concise, factual statements.
4. Each statement must be a complete, standalone fact (no "as mentioned earlier").
5. Preserve note paths and dates when they add context.
Output format: JSON array of strings, one per fact. No commentary.
User: Topic: "{topic}"
Events ({count}):
{event_lines}
Relationship Discovery Prompt (Pass 3)
System: You are an insight engine for a personal knowledge vault.
Given topic summaries from the vault's memory, identify:
1. Connections between topics that the user might not have noticed.
2. Contradictions between topics (a fact in one topic conflicts with another).
3. Open questions — things the vault seems to be exploring but hasn't resolved.
Be concise. Each item should be 1-2 sentences.
Output format: JSON object with three arrays: "connections", "contradictions", "open_questions".
User: Topic summaries:
{topic_summaries}
Process Lifecycle
Startup
- Read config (
loadConfig()+ daemon section). - Validate LLM connectivity: send a trivial test prompt ("respond with OK"). Fail fast with a clear error if LLM is unreachable.
- Write PID file to
{data_dir}/daemon.pid. - Write startup event to daemon log.
- If
run_on_start, execute one full pass. - Enter scheduling loop.
Scheduling Loop
while (running) {
wait(interval_minutes)
if (idle_only && !isIdle()) continue
if (costCapExceeded()) { log("daily cost cap reached"); continue }
runConsolidationPass()
}
Idle Detection
The daemon checks the mtime of events.jsonl (and state.json) to determine when the last CLI/MCP activity occurred. If the file hasn't been modified in idle_threshold_minutes, the user is considered idle.
Alternative (if events.jsonl is not a reliable signal): the daemon can write and read a heartbeat file ({data_dir}/daemon-heartbeat.json) that the CLI/MCP update on every operation.
Shutdown
- On
SIGTERMorSIGINT: finish current pass (or abort gracefully), write shutdown event to log, remove PID file, exit. knowtation daemon stopreads PID from file, sendsSIGTERM.- If process doesn't exit in 10s, send
SIGKILL.
Crash Recovery
- If daemon crashes mid-pass, no data is corrupted (consolidation events are only written after the full pass completes, per strict write discipline).
- On next startup, the daemon detects stale PID file (process not running), cleans it up, and starts fresh.
- Daemon log records the last completed pass timestamp; the next pass picks up from there.
File Layout
lib/
memory-consolidate.mjs # Core consolidation engine (pure functions, no daemon loop)
daemon.mjs # Daemon process lifecycle (start, stop, schedule, idle detect)
daemon-llm.mjs # LLM wrapper: resolves daemon.llm config, falls back to completeChat
cli/
index.mjs # New subcommands: daemon start|stop|status|log, memory consolidate
mcp/
tools/memory.mjs # New tool: memory_consolidate, daemon_status
data_dir/
daemon.pid # PID file (written by daemon, read by CLI)
daemon.log # Structured log (JSONL)
daemon-heartbeat.json # Last activity timestamp (updated by CLI/MCP)
memory/{vault_id}/
events.jsonl # (existing) — daemon reads from here
state.json # (existing) — daemon reads from here
consolidations/ # Consolidation output files (one per pass)
2026-04-04T10-00Z.json
topics/ # (from topic partitioning) — daemon reads/rebuilds
Security Considerations
- No new secrets: The daemon uses the same API keys the user already has configured (or a dedicated env var name they specify in
daemon.llm.api_key_env). - Memory data sent to LLM: The daemon sends memory event summaries (metadata, paths, query strings) to the LLM. This is the same data the session summary feature already sends. Users who are concerned about this can use a local Ollama model.
- No vault content sent: The daemon does NOT read note bodies. It only reads memory events (which contain metadata like paths, query strings, counts — not full note text). Pass 2 (stale detection) checks file existence and mtime but does not read file content.
- Cost guardrails:
max_events_per_pass,max_topics_per_pass, and optionalmax_cost_per_day_usdprevent runaway API costs. - Dry run:
daemon.dry_run: truelets users observe what the daemon would do without any writes. - Process isolation: The daemon is a separate process. A bug in the daemon cannot corrupt the main CLI/MCP session.
- Append-only writes: The daemon only appends new events (consolidation, maintenance, insight). It never modifies or deletes existing events. Index regeneration is idempotent.
Implementation Phases
Phase A: Core Consolidation Engine (standalone, no daemon)
Effort: ~2-3 days
Files:
lib/memory-consolidate.mjs—consolidateMemory(config, opts)functionlib/daemon-llm.mjs— LLM wrapper with daemon config resolutionlib/memory-event.mjs— addconsolidation,maintenance,insightevent typescli/index.mjs—knowtation memory consolidatecommandmcp/tools/memory.mjs—memory_consolidatetooltest/memory-consolidate.test.mjs— unit tests
Deliverable: Users can manually run knowtation memory consolidate to trigger a single pass. No background process yet.
Phase B: Daemon Lifecycle
Effort: ~2-3 days
Files:
lib/daemon.mjs— process management (start, stop, schedule, idle detect, signal handling)cli/index.mjs—knowtation daemon start|stop|status|logcommandsmcp/tools/memory.mjs—daemon_statustooltest/daemon.test.mjs— lifecycle tests (start, stop, idle detection, PID management)
Deliverable: knowtation daemon start runs consolidation on a schedule. --background for detached mode.
Phase C: Stale Verification Pass
Effort: ~1 day
Files:
lib/memory-consolidate.mjs— addrunVerifyPass(config, events)test/memory-consolidate.test.mjs— verify pass tests
Deliverable: Daemon detects stale note references and marks events accordingly.
Phase D: Relationship Discovery Pass
Effort: ~1-2 days
Files:
lib/memory-consolidate.mjs— addrunDiscoverPass(config, consolidations)test/memory-consolidate.test.mjs— discovery pass tests
Deliverable: Daemon generates cross-topic insights. This pass is opt-in (passes.discover: true) since it's the most LLM-intensive.
Phase E: OpenAI-Compatible API Support
Effort: ~1 day
Files:
lib/daemon-llm.mjs— addbase_urlsupport for OpenRouter, vLLM, LM Studio, text-generation-webui, etc.test/daemon-llm.test.mjs— provider tests
Deliverable: Users can point the daemon at any OpenAI-compatible endpoint, not just the three built-in providers.
Phase F: Cost Tracking and Guardrails
Effort: ~1 day
Files:
lib/daemon.mjs— token counting (estimate from response), daily cost accumulator- Config:
max_cost_per_day_usd
Deliverable: Daemon stops making LLM calls if estimated daily cost exceeds the configured cap.
Total Estimated Effort
| Phase | Days | Depends On |
|---|---|---|
| A: Core Engine | 2-3 | Memory improvements plan (optional but recommended) |
| B: Daemon Lifecycle | 2-3 | Phase A |
| C: Stale Verify | 1 | Phase A, skeptical memory (from improvements plan) |
| D: Discovery | 1-2 | Phase A |
| E: OpenAI-Compat | 1 | Phase A |
| F: Cost Guards | 1 | Phase B |
| Total | 8-11 days |
Phases A + B are the MVP. Phases C-F are incremental enhancements that can be added over time.
Example User Experience
Setup (30 seconds)
# config/local.yaml
daemon:
enabled: true
interval_minutes: 60
llm:
model: gpt-4o-mini # cheap, fast, good enough for consolidation
Start
$ knowtation daemon start --background
Daemon started (PID 42891). Consolidation every 60 min when idle.
LLM: OpenAI gpt-4o-mini (from OPENAI_API_KEY).
Log: data/daemon.log
Check Status
$ knowtation daemon status
Status: running (PID 42891, uptime 3h 12m)
Last pass: 2026-04-04T14:00:00Z (processed 47 events, 5 topics, 3 consolidations written)
Next pass: ~2026-04-04T15:00:00Z (if idle)
Estimated cost today: $0.003
Manual Consolidation
$ knowtation memory consolidate --dry-run
[dry-run] Would process 23 events across 4 topics.
[dry-run] Topic "blockchain": 8 events → estimated 4-5 facts
[dry-run] Topic "architecture": 6 events → estimated 3-4 facts
[dry-run] Topic "testing": 5 events → estimated 2-3 facts
[dry-run] Topic "import": 4 events → estimated 2 facts
$ knowtation memory consolidate
Consolidated 23 events across 4 topics.
blockchain: 5 facts written (consolidation event mem_a1b2c3d4e5f6)
architecture: 4 facts written (consolidation event mem_f6e5d4c3b2a1)
testing: 3 facts written
import: 2 facts written
Index regenerated.
Using Ollama (Fully Local, No API Costs)
daemon:
enabled: true
interval_minutes: 30 # can run more often since it's free
idle_threshold_minutes: 5
llm:
provider: ollama
model: llama3.2
# base_url defaults to http://localhost:11434
Hosted Deployment (Hub)
Architecture
On the hosted Hub, users do not run a persistent daemon process. Instead:
- Consolidation is an API endpoint:
POST /api/v1/memory/consolidateon the existing bridge, using the same JWT auth andX-Vault-Idscoping as all other memory endpoints. - Scheduling is server-side: a lightweight cron worker (or queue consumer) checks which users are due for consolidation and triggers the endpoint on their behalf. This avoids N persistent processes for N users.
- LLM calls are server-side: the bridge calls the LLM directly (not the user's machine). The Hub provides the API key and model — the user does not need their own.
┌─────────────────────────────────────────────────────┐
│ Hub Infrastructure │
│ │
│ ┌──────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ Gateway │───▶│ Bridge │───▶│ LLM API │ │
│ │ (proxy) │ │ │ │ (GPT-4o- │ │
│ └──────────┘ │ POST /api/ │ │ mini etc) │ │
│ ▲ │ v1/memory/ │ └───────────┘ │
│ │ │ consolidate │ │
│ ┌────┴─────┐ └──────┬───────┘ │
│ │ Cron │───────────┘ │
│ │ Worker │ triggers per-user on schedule │
│ └──────────┘ │
└─────────────────────────────────────────────────────┘
Cost Model
Per-pass LLM cost using GPT-4o-mini ($0.15/1M input, $0.60/1M output):
| Pass intensity | Input tokens | Output tokens | Cost per pass |
|---|---|---|---|
| Light (50 events, 3 topics) | ~10K | ~3K | ~$0.003 |
| Medium (100 events, 5 topics) | ~25K | ~5K | ~$0.007 |
| Heavy (200 events, 5 topics) | ~40K | ~5K | ~$0.009 |
At default interval (every 2 hours, 12 passes/day): $0.04-$0.11/day or $1.20-$3.30/month per user.
Pricing Recommendation
Follow the same per-use metering pattern already in place for indexing and search:
- Add
consolidationas a metered usage type alongsideindexandsearchin the billing system. - Price per pass: $0.005-$0.01 per consolidation pass (covers LLM cost + margin).
- Bundle into tiers: e.g., "Pro includes 10 consolidation passes/month, $0.01/pass after that."
- No separate service: the endpoint lives on the existing bridge. No new infrastructure beyond the cron worker.
- No price increases needed: consolidation is additive revenue. Users who don't enable it pay nothing extra. Users who want aggressive consolidation (every hour) pay proportionally more.
Parity: Local vs Hosted
| Dimension | Local (Ollama) | Hosted (Hub) |
|---|---|---|
| LLM cost | $0 (user hardware) | Per-pass metered |
| LLM quality | Depends on model (llama3.2 is decent) | GPT-4o-mini or better (consistent quality) |
| Frequency | Unlimited (user controls) | Metered (cost scales with frequency) |
| Privacy | Full (nothing leaves machine) | Event metadata sent to Hub LLM (not note bodies) |
| Setup | Install Ollama + model | Zero setup (Hub manages LLM) |
| Availability | Only when machine is on | Always-on (server-side cron) |
| Feature parity | All passes available | All passes available |
The key tradeoff: local users get unlimited free runs but must manage their own hardware and model quality. Hosted users get zero-setup, always-on consolidation with higher-quality LLM output, but pay per use.
Implementation Notes
- The bridge endpoint reuses the same
consolidateMemory()function fromlib/memory-consolidate.mjs. - The cron worker is a simple script that queries the database for users with
daemon.enabled: trueand whose last consolidation was more thaninterval_minutesago. - Server-side LLM calls use a Hub-owned API key (configured in Hub environment, not user config).
- Per-user cost tracking: each pass logs its token usage to the billing meter.
- The
max_events_per_passandmax_topics_per_passlimits apply on hosted too, preventing any single user from generating outsized LLM costs.
Open Questions (for future refinement)
Should consolidations replace or supplement raw events? Current design: supplement only (consolidation events are additive). Raw events are never deleted by the daemon — only by retention pruning. This is conservative but means storage grows. A future "compact" mode could archive raw events after consolidation.
Multi-vault coordination. If a user has multiple vaults, should the daemon run one loop per vault or one shared loop? Current design: one loop, iterates over all configured vaults sequentially.
MCP sampling vs. direct LLM. The daemon currently calls the LLM directly via
completeChat/daemon-llm.mjs. An alternative is to use MCP sampling (request the host client's LLM viacreateMessage). This would let the daemon use whatever model the user's IDE is configured with, but adds coupling to MCP transport. Worth exploring as an optional mode.