IMPORT-SOURCES.md file-level

at sha256:6 · View file ↗ · Intel ↗

History
1 files
1 commits
0 hotspots
0 ๐ŸงŠ dead
0 ๐Ÿ’ฅ blast risk
sha256:4 fix(security): pin patched transitive deps to clear Dependabot moderateโ€ฆ · aaronrene · Jun 11, 2026
1 # Knowtation โ€” Import Sources and External Knowledge Bases
2
3 This document specifies how to bring data and memory **into** the Knowtation vault from other platforms and devices. It covers **live capture** (messages and events โ†’ inbox) and **batch and one-time imports** (exports and files). Capture and import both produce vault notes that conform to [SPEC ยง1โ€“2](./SPEC.md) (frontmatter, project, tags). The inbox contract for capture plugins is [CAPTURE-CONTRACT.md](./CAPTURE-CONTRACT.md).
4
5 ---
6
7 ## 1. Design principles
8
9 - **One vault format:** Every import produces Markdown notes with our frontmatter (or MIF-compatible notes that also satisfy our schema). No second schema inside the vault.
10 - **Traceable origin:** Every imported note has `source` (e.g. `chatgpt`, `claude`, `notebooklm`) and optional `source_id` (external id) and `date` so we can dedupe and attribute.
11 - **Re-import safe:** Importers should support idempotency (e.g. skip or update when `source` + `source_id` already exists) when the platform provides stable ids.
12 - **Agent- and human-friendly:** Imported notes are searchable, filterable by `--project` / `--tag`, and usable for blogs, podcasts, analysis, and marketing like any other vault content.
13
14 ---
15
16 ## 2. Live capture (messages โ†’ inbox)
17
18 **Capture** is the real-time path: platform messages and webhooks become **inbox notes** as they arrive. **Import** (ยง3) is for exports, uploads, and CLI batch runs. Both use the same frontmatter ideas (`source`, `source_id`, `date`).
19
20 **Hub endpoint:** `POST /api/v1/capture` with a JSON body. If the Hub has **`CAPTURE_WEBHOOK_SECRET`** set in its environment, clients must send header **`X-Webhook-Secret: <secret>`**.
21
22 ### Capture at a glance
23
24 | | Channel | Setup |
25 |--|---------|--------|
26 | ๐Ÿ’ฌ | **Slack** | Adapter default port **3132**; **`SLACK_SIGNING_SECRET`**; Slack Events API โ†’ adapter โ†’ Hub capture. Or Zapier/n8n: Slack trigger โ†’ HTTP POST to capture. |
27 | ๐ŸŽฎ | **Discord** | Adapter default port **3133**; webhook or bot POST โ†’ capture. Or Zapier/n8n โ†’ capture. |
28 | โœˆ๏ธ | **Telegram** | Adapter default port **3134**; Bot API webhook or simplified JSON โ†’ capture. |
29 | ๐Ÿ“ฑ | **WhatsApp** | No first-party adapter; use **Zapier**, **n8n**, or similar to forward messages to **`POST /api/v1/capture`**. |
30
31 **Scripts (self-hosted):** `scripts/capture-slack-adapter.mjs`, `capture-discord-adapter.mjs`, `capture-telegram-adapter.mjs`. Standalone local webhook: `node scripts/capture-webhook.mjs --port 3131` (see [CAPTURE-CONTRACT.md](./CAPTURE-CONTRACT.md)).
32
33 **Minimal JSON example:**
34
35 ```json
36 {"body": "text", "source": "slack"}
37 ```
38
39 **Details:** [CAPTURE-CONTRACT.md](./CAPTURE-CONTRACT.md) (plugin contract), [MESSAGING-INTEGRATION.md](./MESSAGING-INTEGRATION.md) (Slack / Discord / Telegram), [HUB-API.md](./HUB-API.md) ยง3.5 Capture.
40
41 ---
42
43 ## 3. Supported import sources (spec)
44
45 The CLI command **`knowtation import <source-type> <input> [options]`** accepts the following source types. Each maps an external format to vault notes.
46
47 **Canonical list:** The same `source_type` strings are enforced in [`lib/import-source-types.mjs`](../lib/import-source-types.mjs) for the CLI, self-hosted Hub `POST /api/v1/import`, Hub import modal, and the MCP `import` tool. If you add an importer, update that module and the table below together.
48
49 **Manual verification:** See [IMPORT-MANUAL-CHECKLIST.md](./IMPORT-MANUAL-CHECKLIST.md).
50
51 ### Hub browser: one upload, ZIP extraction, in-browser ZIP (4Aโ‚‚), multi-file (4B), and drop (4C)
52
53 The Hub Import modal can send **one** multipart `file` per `POST /api/v1/import`, or **several** sequential `POST` requests in one batch. When the uploaded filename ends with **`.zip`**, the self-hosted Hub and hosted bridge **extract the archive** to a temporary directory (with zip-slip checks) and pass that **directory path** to `runImport`โ€”the same pattern as a folder path on the CLI. **Phase 4Aโ‚‚:** the Hub can also build a **.zip in the browser** (JSZip) for tree-shaped `source_type` values and then upload that single `file`โ€”one HTTP round trip, same server contract. **Phase 4B:** **multiple** file selection, **โ€œChoose folder (ZIP in browser)โ€** (`webkitdirectory`), and (for **PDF, DOCX, and other single-file-per-importer** types) **sequential** imports with a combined progress/summary. **Phase 4C:** a **dashed drop zone** in the Import dialog accepts **dragged files or a folder** (Chromium: full tree via `DataTransfer` directory entries; other browsers: same as a flatter file list). Dropped content uses the same pipeline and caps as **Choose folder** and multi-file. Caps: **~100MB** per upload (multer), up to **5000** file entries in one client-built ZIP, up to **200** files in one **sequential** run; the whole in-browser zip step holds uncompressed bytes in **memory** (very large trees: zip on the desktop or use the CLI). Hosted: each request is also subject to gateway/bridge time limits (often on the order of **26s** on Netlify).
54
55 - **Folder-capable types** (for example **`markdown`**, **`chatgpt-export`**, **`claude-export`**, **`notebooklm`** when given a directory): use a pre-made **ZIP**, **Choose folder** (4Aโ‚‚), the **4C** drop target, or **multi-select** many files to produce a client **ZIP** when the Hub decides `client_zip` mode (see `web/hub/hub-client-import-zip.mjs`). The importer walks the tree and picks up each supported file (for `markdown`, **`.md` / `.markdown`** only; other extensions are skipped).
56 - **`pdf`** and **`docx`**: the importers require a **single file** on disk and **throw if the input is a directory** (`lib/importers/pdf.mjs`, `lib/importers/docx.mjs`). **Do not** upload a server-**ZIP** for these types in the Hub. **Many PDFs or DOCX** in the Hub: **4B** = **N sequential** `POST` imports (one file per request), or the CLI: **`knowtation import pdf` / `docx`** for folder paths.
57 - **Hosted MCP:** still **one** `import` call per file (no `import_batch` tool); see [PARITY-MATRIX-HOSTED.md](./PARITY-MATRIX-HOSTED.md).
58
59 **Reference:** [IMPORT-URL-AND-DOCUMENTS-PHASES.md](./IMPORT-URL-AND-DOCUMENTS-PHASES.md) Phases **4A**, **4Aโ‚‚**, **4B**, **4C** (4A + bulk copy, 4Aโ‚‚ + 4B + 4C **shipped** on `feat/import-url-documents-mcp`).
60
61 ### At a glance
62
63 | | Source | Type | Format |
64 |--|--------|------|--------|
65 | ๐Ÿค– | **ChatGPT** | `chatgpt-export` | ZIP or folder export |
66 | ๐Ÿง  | **Claude** | `claude-export` | Chat + memory export |
67 | ๐Ÿ’พ | **Mem0** | `mem0-export` | JSON memory export |
68 | ๐Ÿ“ | **Notion** | `notion` | API; page IDs + key |
69 | ๐ŸŽซ | **Jira** | `jira-export` | CSV export |
70 | ๐Ÿ““ | **NotebookLM** | `notebooklm` | Markdown or JSON |
71 | ๐Ÿ“ | **Google Drive** | `gdrive` | Markdown folder |
72 | ๐Ÿ“Š | **Generic CSV** | `generic-csv` | Any UTF-8 CSV; one note per data row |
73 | ๐Ÿงฉ | **JSON (array)** | `json-rows` | `.json` file; root must be an array of objects |
74 | ๐Ÿ“— | **Excel** | `excel-xlsx` | `.xlsx` (first sheet, one note per row) |
75 | ๐Ÿ‘ค | **vCard** | `vcf` | `.vcf` / `.vcard`, one note per contact under `โ€ฆ/contacts/vcf/` |
76 | ๐ŸŸฉ | **Google Sheets (API)** | `google-sheets` | Live read via API (spreadsheet id, not a file); see ยง below |
77 | ๐Ÿ“‹ | **Linear** | `linear-export` | CSV export |
78 | ๐Ÿ”— | **MIF** | `mif` | Memory Interchange Format |
79 | ๐Ÿ“„ | **Markdown** | `markdown` | File or folder |
80 | ๐Ÿ“• | **PDF** | `pdf` | Single `.pdf` file (text extraction) |
81 | ๐Ÿ“˜ | **DOCX** | `docx` | Single `.docx` file (Word โ†’ Markdown via mammoth) |
82 | ๐ŸŒ | **URL** | `url` | https URL (Hub **Import from URL** / `POST /api/v1/import-url`; CLI `knowtation import url โ€ฆ`) |
83 | ๐ŸŽ™๏ธ | **Audio** | `audio` | Whisper transcription |
84 | ๐Ÿ’ฐ | **Wallet CSV** | `wallet-csv` | Tx history; 11 formats |
85 | ๐Ÿ—„๏ธ | **Supabase** | `supabase-memory` | Memory table import |
86 | ๐Ÿฆž | **OpenClaw** | `openclaw` | Agent memory + chats |
87 | ๐Ÿชถ | **Hermes Agent** | `markdown` (from `~/.hermes/memories/`) | Agent MEMORY.md + USER.md |
88 | ๐Ÿ“ฅ | **Imports** | _(Hub UI + CLI)_ | Local files & team uploads |
89
90 **Live inbox capture** (Slack, Discord, Telegram, WhatsApp via automation) is not a CLI `import` type: use **`POST /api/v1/capture`** and the adapters in **ยง2**.
91
92 ### Full reference
93
94 | Source type | Input (path or URI) | Description |
95 |-------------------|-------------------------|-------------|
96 | ๐Ÿค– `chatgpt-export` | Path to OpenAI export ZIP or folder with `conversations.json` | ChatGPT data export (Settings โ†’ Export Data). One note per conversation or per message thread; frontmatter: `source: chatgpt`, `source_id`, `date`, optional `project`, `tags`. |
97 | ๐Ÿง  `claude-export` | Path to Claude export ZIP or folder (chat history / memory) | Claude data export (Settings โ†’ Privacy โ†’ Export) and/or memory export. One note per conversation or per memory entry; `source: claude`, `source_id`, `date`. |
98 | ๐Ÿ’พ `mem0-export` | Path to Mem0 export JSON | Mem0 memory export. One note per memory; `source: mem0`, `source_id`, `date`. |
99 | ๐Ÿ“ `notion` | Comma-separated Notion page IDs | Fetches pages as markdown via Notion API. Requires `NOTION_API_KEY`. One note per page; `source: notion`, `source_id: page_id`. |
100 | ๐ŸŽซ `jira-export` | Path to Jira CSV file (or folder with one .csv) | Jira Cloud/Server CSV export. One note per issue; `source: jira`, `source_id: issue key`, `title` from summary, and **`import_column_headers`** (JSON array as a string) plus a body section **All CSV fields (JSON)** with the **entire** row (every column) for search. The top of the body still foregrounds **Summary** and **Description** for quick reading. |
101 | ๐Ÿ““ `notebooklm` | Path to folder of .md files or to a .json export | NotebookLM: folder of markdown (e.g. from takeout/Apify) or JSON with sources/conversations array. One note per file or entry; `source: notebooklm`. |
102 | ๐Ÿ“ `gdrive` | Path to folder of Markdown files | Google Drive: folder of .md files (e.g. from export or pandoc). One note per file; `source: gdrive`, `source_id` from filename. |
103 | ๐Ÿ“Š `generic-csv` | Path to a **single** `.csv` file (UTF-8; optional BOM) | **Tabular** import: first row = headers, each following row = one note. For each row, the body lists **every** column as a bullet (**no columns dropped**), a **`## Full row (JSON)`** code block with the same values (stable keys; duplicate column names get `__2`, `__3` suffixes), and frontmatter **`import_column_headers`** (JSON string of the header list). Frontmatter: `source: csv-import`, **`title`** (includes the **filename** and, if a column named **`title`**, **`name`**, **`subject`**, **`summary`**, or **`label`** has a value, that value; otherwise **`filename (row N)`** for that data row), `source_id` (from `id` / `uuid` / `key` column if present, else content hash), `csv_file`, `row_index`, `date`. The note body H1 matches **`title`**. Max **10,000** data rows, **50 MB** file, **32,000** chars per cell (cell truncation noted in JSON if applied). **Google Sheets (download):** *File โ†’ Download โ†’ Comma-separated values (.csv)* then import. |
104 | ๐Ÿงฉ `json-rows` | Path to a **single** `.json` file whose **root** is a **JSON array of plain objects** (not arrays inside the root array) | One note per object. Frontmatter: `source: json-import`, `source_id` (from `id`, `uuid`, or `source_id` if present, else hash of object), `json_file`, `item_index`, optional `title` (from `title` or `name` string), `date`. Body: full object in a fenced `json` code block. Max **10,000** objects, **50 MB** file. **Not** a substitute for `claude-export` / `mem0-export` (those are platform-specific shapes). |
105 | ๐Ÿ“— `excel-xlsx` | Path to a **single** `.xlsx` file (Office Open XML) | **Tabular** import from the **first worksheet** only, same model as `generic-csv` (header row, one note per data row, bullets + **`## Full row (JSON)`** + **`import_column_headers`**, same **`title`** rules). Parsed with **`exceljs`** (not the legacy `xlsx` / SheetJS community package). `source: xlsx-import`, `xlsx_file`, `row_index`, `date`. **Legacy** `.xls` is not supported. Max **50 MB** file, **10,000** rows, **32,000** characters per cell (truncated). |
106 | ๐Ÿ‘ค `vcf` | Path to a **single** `.vcf` (or `.vcard`) | One note per `BEGIN:VCARD โ€ฆ END:VCARD` block. `source: vcf-import`, `vcf_file`, `vcf_index`, `source_id` (vCard `UID` if present, else hash), `title` from `FN` when possible. Path: **`<inbox or project>/contacts/vcf/โ€ฆ`**. Fenced raw block in each note. Max **20 MB** file, **20,000** cards. |
107 | ๐ŸŸฉ `google-sheets` | **Spreadsheet id** string (the long id in a `docs.google.com/spreadsheets/d/<id>` URL) โ€” not a file path for normal use | **Google Sheets API** read-only. Same tabular model as `generic-csv` (default: **first tab**, **A1:ZZ10000**; override with `sheets_range` / Hub field **Range** or CLI `--sheets-range 'Sheet1!A1:E500'`), including the same **`title`**, **bullet list**, **`## Full row (JSON)`**, and **`import_column_headers`** model as `generic-csv` (the โ€œfileโ€ part of the title is the **spreadsheet id**; add a `name`/`title`/โ€ฆ column for a readable second half; otherwise **`spreadsheet_id (row N)`**). Frontmatter: `source: google-sheets-import`, `title`, `spreadsheet_id`, `row_index`, `date`, **`import_column_headers`**. **Auth:** a **service account** JSON. Set `GOOGLE_SERVICE_ACCOUNT_JSON` (inline JSON) or `GOOGLE_APPLICATION_CREDENTIALS` (path to the key file). The spreadsheet must be **shared with the service account email (Viewer** is enough) if it is not owned by that project. The **self-hosted bridge** and **any process running `runImport` for this type** need this env. **Hub / gateway:** `POST /api/v1/import` with `source_type=google-sheets`, `spreadsheet_id`, optional `sheets_range` โ€” **no** `file` (multipart can omit the file part). If you only have a CSV, use **File โ†’ Download โ†’ Comma separated values** and `generic-csv` instead. |
108 | ๐Ÿ“‹ `linear-export` | Path to Linear CSV file | Linear workspace export (CSV). One note per issue; `source: linear`, `source_id`, `title` when the export has a title field, and **`import_column_headers`** + **All CSV fields (JSON)** in the body so **all exported columns** remain searchable (not only id / title / description). |
109 | ๐Ÿ”— `mif` | Path to `.memory.md` or `.memory.json` or folder of MIF files | [Memory Interchange Format](https://mif-spec.dev/). MIF is Obsidian-native; files can be copied in as-is or normalized to our frontmatter. |
110 | ๐Ÿ“„ `markdown` | Path to file or folder of Markdown files | Generic Markdown import. Preserve or infer frontmatter; add `source: markdown`, `date` if missing. For Evernote/Standard Notes/etc. exports that are already Markdown. **Hub:** a **ZIP of a folder tree** of `.md` / `.markdown` files is supported (server extracts then walks the tree). |
111 | ๐Ÿ“• `pdf` | Path to a single `.pdf` file | Extracts plain text with PDF.js (via **unpdf**). One note under `inbox/imports/pdf/` (or project inbox); frontmatter: `source: pdf-import`, `source_id` (SHA-256 of file bytes), `pdf_file`, `pdf_pages`, `date`, `title`. Fails if no text can be extracted (e.g. some image-only scans). **Hub / hosted MCP:** multipart `POST /api/v1/import` or MCP **`import`** with `source_type: pdf` and file bytes (same as other file-based imports). **Hub:** upload the **`.pdf` file**, not a ZIP (ZIP is extracted to a directory; this importer requires a file). |
112 | ๐Ÿ“˜ `docx` | Path to a single `.docx` file | Converts to Markdown with **mammoth** (Office Open XML only; not binary `.doc`). One note under `inbox/imports/docx/` (or project inbox); frontmatter: `source: docx-import`, `source_id` (SHA-256 of file bytes), `docx_file`, `date`, `title`. Fails on corrupt files or empty documents. **Hub / hosted MCP:** same multipart / **`import`** pattern as PDF. **Hub:** upload the **`.docx` file**, not a ZIP (same directory-vs-file rule as PDF). |
113 | ๐ŸŒ `url` | **HTTPS URL string** (not a filesystem path) | Fetches the URL server-side with SSRF protections. One note under `inbox/imports/url/` (or project inbox); frontmatter: `source: url-import`, `source_id` (hash of canonical URL), `canonical_url`, `date`, `title`. Modes: **`auto`** (extract main article HTML when possible, else bookmark), **`bookmark`** (link + metadata only), **`extract`** (requires readable article HTML or error). **Hub / hosted:** `POST /api/v1/import-url` JSON `{ "url", "mode"?, "project"?, "output_dir"?, "tags"? }`. **CLI:** `knowtation import url "https://โ€ฆ" [--url-mode auto|bookmark|extract]`. Paywalled or bot-blocked pages: use **`bookmark`**. |
114 | ๐ŸŽ™๏ธ `audio` | Path to audio file or URL (e.g. wearable webhook payload) | **Primary path for in-Hub transcription** (self-hosted). OpenAI Whisper; **max ~25 MB** per file. One note per file; frontmatter: `source: audio`, `source_id`, `date`. |
115 | ๐Ÿ’ฐ `wallet-csv` | Path to wallet/exchange transaction history CSV (or folder containing one .csv) | Converts wallet export files into vault notes with blockchain frontmatter. One note per row; `source: wallet-csv-import`, `source_id: tx_hash`, blockchain fields (`network`, `wallet_address`, `tx_hash`, `payment_status`, `amount`, `currency`, `direction`, `confirmed_at`, `block_height`). Notes land in `inbox/wallet-import/`. Auto-detects named formats: **Coinbase**, **Coinbase Pro**, **Exodus**, **ICP Rosetta**, **Kraken**, **Binance**, **MetaMask/Etherscan**, **Phantom (Solana)**, **Ledger Live**. Falls back to generic column alias matching for any other CSV. Re-import is safe: duplicate rows (same output path) are skipped. |
116 | ๐Ÿ—„๏ธ `supabase-memory` | Supabase connection + table name | Import memory rows from a Supabase table. For users coming from database-centric stacks. |
117 | ๐Ÿฆž `openclaw` | Path to OpenClaw data export or memory dump | Import agent conversations and memory from [OpenClaw](https://github.com/openclaw/openclaw). One note per conversation or memory entry; `source: openclaw`, `source_id`, `date`. |
118 | ๐Ÿชถ **Hermes Agent** | `~/.hermes/memories/MEMORY.md`, `USER.md`, or `hermes memory export` JSON | [Hermes Agent](https://github.com/NousResearch/hermes-agent) built-in memory lives under `~/.hermes/memories/`. Import with `knowtation import markdown` on those files, or export via `hermes memory export` / `hermes backup` first. Tag notes `hermes` for filters. Wire ongoing access via Hub API + MCP (Settings โ†’ Integrations). |
119
120 > **Video:** CLI and MCP still support `knowtation import video <file>` (same Whisper pipeline as audio), but video files are usually over 25 MB. Export audio first or transcribe with another service and import as Markdown.
121
122 **Options (common):** `--project <slug>`, `--output-dir <vault-path>`, `--tags tag1,tag2`, `--dry-run`, `--json`. **`google-sheets` only:** `--sheets-range 'A1-notation'`. If `--output-dir` is omitted, default is `vault/inbox/` or `vault/projects/<project>/inbox/` when `--project` is set.
123
124 > **See also:** [Templates and Skills](./TEMPLATES-AND-SKILLS.md) โ€” starter vault templates, agent skill packs, and how they compose with import sources.
125
126 ---
127
128 ## 4. Platform-specific notes
129
130 ### ๐Ÿค– 4.1 ChatGPT (OpenAI)
131
132 - **How users get data:** Settings โ†’ Data Controls โ†’ Export Data (or Privacy Portal). Email link to ZIP containing `conversations.json` (and sometimes `chat.html`). Link expires in 24 hours; export can take up to 7 days.
133 - **Format:** `conversations.json` is a tree of messages (mapping of id โ†’ { message, parent, children }). Each message has `content.parts`, `author.role`, timestamps.
134 - **Importer behavior:** Parse `conversations.json`; for each conversation, produce one note (or one per thread) with body = concatenated or structured transcript. Frontmatter: `source: chatgpt`, `source_id: <conversation-id>`, `date`, `title` from conversation title. Optional: one note per message for fine-grained search (heavier).
135 - **Third-party:** Browser extensions (e.g. ChatGPT Exporter) can export per-conversation JSON/Markdown/HTML; importer can accept a folder of such files and treat as `chatgpt-export` or `markdown` with `source: chatgpt`.
136
137 ### ๐Ÿง  4.2 Claude (Anthropic)
138
139 - **How users get data:** Settings โ†’ Privacy โ†’ Export data (chat history). Memory: Settings โ†’ Capabilities โ†’ View and edit your memory โ†’ export (and Claude supports importing memory from other AI providers via a prompt).
140 - **Format:** Export is account data (format may vary). Memory export is a user-facing list that can be copied; API or file format TBD.
141 - **Importer behavior:** Same pattern as ChatGPT: one note per conversation or per memory entry; `source: claude`, `source_id`, `date`. Third-party tools (e.g. claude-exporter) produce JSON/Markdown; we can accept that as `claude-export` or `markdown` with `source: claude`.
142
143 ### ๐Ÿ’พ 4.3 Mem0
144
145 - **How users get data:** Mem0 API: `create_memory_export()` with schema; then retrieve via `get_memory_export()`. Returns JSON (Pydantic-style schema).
146 - **Importer behavior:** Map Mem0 memories to vault notes. Each memory โ†’ one note; frontmatter can include Mem0 metadata (e.g. `mem0_id`, `user_id`) and our `source: mem0`, `source_id`, `date`. Optionally support MIF as output so Mem0 users can later use MIF-native tools.
147
148 ### ๐Ÿ“ 4.4 Notion
149
150 - **How users get data:** Notion API: create an integration at notion.so/my-integrations, share pages with it, then use page IDs (from the page URL: notion.so/workspace/page_id).
151 - **Importer behavior:** `knowtation import notion <page_id>` or `knowtation import notion "id1,id2,id3"`. Requires `NOTION_API_KEY`. Fetches each page as markdown via `GET /v1/pages/{page_id}/markdown`; one note per page with `source: notion`, `source_id: page_id`.
152
153 ### ๐ŸŽซ๐Ÿ“‹ 4.5 Jira and Linear
154
155 - **Jira:** Export from Jira (list or search โ†’ Export CSV). Importer: `knowtation import jira-export /path/to/export.csv --output-dir imports/jira`. Maps Issue key, Summary, Description, Project to vault notes.
156 - **Linear:** Export from Linear (Command menu โ†’ Export data โ†’ CSV). Importer: `knowtation import linear-export /path/to/linear-export.csv --project myproject`.
157
158 ### ๐Ÿ““๐Ÿ“ 4.6 NotebookLM and Google Drive
159
160 - **NotebookLM:** Accepts (1) a folder of markdown files (e.g. from Google takeout or third-party Apify export), or (2) a JSON file with an array of entries (`content`, `id`, `title`). One note per file or entry; `source: notebooklm`.
161 - **Google Drive:** Accepts a folder of Markdown files. Export Docs as .docx then convert to .md (e.g. pandoc), or use a sync script. Importer: `knowtation import gdrive /path/to/folder`; `source: gdrive`, `source_id` from filename.
162
163 ### ๐Ÿ“š 4.7 Confluence
164
165 - **How users get data:** Confluence has no native markdown export. Use third-party tools (e.g. confluence-cli, nodejs-confluence-export) to export a space or page to a folder of markdown files.
166 - **Importer behavior:** Export to a folder with one of those tools, then run `knowtation import markdown /path/to/confluence-export --output-dir imports/confluence --tags confluence`. Optional: add a thin `confluence-export` importer that accepts the same folder and sets `source: confluence`, `source_id` from filename.
167
168 ### ๐Ÿ”— 4.8 MIF (Memory Interchange Format)
169
170 - **What it is:** [mif-spec.dev](https://mif-spec.dev/) โ€” vendor-neutral AI memory format. Dual representation: `.memory.md` (Markdown + YAML frontmatter) and `.memory.json` (JSON-LD). Obsidian-native.
171 - **Importer behavior:** Copy `.memory.md` into vault (they are already valid Obsidian notes). Optional: normalize to our frontmatter (e.g. map `mif:id` to `source_id`, add `source: mif`). No need to change body. Enables future interop with Mem0, Zep, etc. if they adopt MIF.
172
173 ### ๐ŸŽ™๏ธ 4.9 Audio and video (including wearables)
174
175 - **Product note:** **Audio** is the recommended path for in-app transcription (smaller files, usually under OpenAIโ€™s **25&nbsp;MB** per-request limit). **Video** in the **self-hosted Hub** import dialog is **coming soon**; use **`knowtation import video`** from the CLI (same limit), or strip audio / transcribe elsewhere and import **Markdown**.
176 - **Smart glasses / wearables:** Devices (e.g. TranscribeGlass, Omi, Ray-Ban + GlassFlow, ViveGlass) often produce transcripts via app, webhook, or export. Omi supports webhooks for real-time transcript delivery. TranscribeGlass and similar may export text or send to a URL.
177 - **Importer behavior:** `import audio <file>` or `import video <file>` transcribes via OpenAI Whisper (`OPENAI_API_KEY` required) โ†’ one note with transcript as body; frontmatter `source: audio` or `source: video`, `source_id`, `date`. Formats: mp3, mp4, mpeg, mpga, m4a, wav, webm. The API rejects uploads over **25&nbsp;MB** (see `WHISPER_MAX_FILE_BYTES` in `lib/transcribe.mjs`). **Self-hosted:** if **ffmpeg** is available, Knowtation may transcode down first (`lib/ffmpeg-whisper-transcode.mjs`; disable via `transcription.transcode_oversized: false` or `KNOWTATION_TRANSCODE_OVERSIZED=0`). Webhook receivers (e.g. Omi) can write transcripts directly to inbox per message-interface contract.
178 - **Past blogs/videos:** User exports blog text or video transcript (or uses our transcription). Import as `markdown` or `audio`/`video` so historical content lives in the vault.
179
180 ---
181
182 ## 5. CLI surface (summary)
183
184 - **Command:** `knowtation import <source-type> <input> [--project <slug>] [--output-dir <path>] [--tags t1,t2] [--dry-run] [--json]`
185 - **Behavior:** Run the importer for the given source type; write notes to vault; optionally run indexer after (config or flag). Output: list of written paths; with `--json`, machine-readable summary.
186 - **Exit codes:** 0 success, 1 usage error, 2 runtime error (same as rest of CLI).
187
188 ---
189
190 ## 6. What we're not forgetting
191
192 - **Any audio:** Smart glasses, wearables, past blogs/videos, recordings โ†’ transcription (when under **25&nbsp;MB**) or external transcript โ†’ vault note with `source` and `source_id`.
193 - **Any knowledge base:** Google Drive, NotebookLM, ChatGPT, Claude, Mem0, Evernote/Standard Notes (as Markdown), MIF โ†’ all have a defined `import` path into the vault.
194 - **Any agent or business use:** Once in the vault, content is searchable, project/tag-filterable, and usable for blogs, podcasts, videos, marketing, analysis, writing. No second-class content.
195
196 Implementors: follow [SPEC.md](./SPEC.md) import contracts and extend `lib/importers/` with tests.
197
198 ---
199
200 ## 7. How to run (examples)
201
202 **Markdown** (file or folder):
203 ```bash
204 knowtation import markdown ./my-notes.md --output-dir imports/notes
205 knowtation import markdown ./exported-folder --project myproject
206 ```
207
208 **URL** (https only; server-side fetch with SSRF limits):
209
210 ```bash
211 knowtation import url "https://example.com/article" --project research --tags reading
212 knowtation import url "https://example.com/paywalled" --url-mode bookmark --dry-run --json
213 ```
214
215 **Hub / hosted:** Import modal โ†’ paste URL โ†’ choose **URL capture mode** โ†’ Import; or `POST /api/v1/import-url` with JSON body (same route on self-hosted Hub and hosted gateway โ†’ bridge).
216
217 **ChatGPT export** โ€” Extract the OpenAI ZIP first, then:
218 ```bash
219 knowtation import chatgpt-export /path/to/extracted-folder --output-dir imports/chatgpt --tags chatgpt
220 ```
221 The folder must contain `conversations.json` (Settings โ†’ Data Controls โ†’ Export Data).
222
223 **Claude export** โ€” Folder of .md files (from third-party exporters) or JSON:
224 ```bash
225 knowtation import claude-export /path/to/claude-export-folder --project myproject
226 ```
227
228 **MIF** (.memory.md or folder):
229 ```bash
230 knowtation import mif ./my-memories.memory.md --output-dir imports/mif
231 ```
232
233 **Mem0 export** (JSON file):
234 ```bash
235 knowtation import mem0-export ./mem0-export.json --project memories
236 ```
237
238 **Notion** (requires NOTION_API_KEY; page IDs from Notion page URLs):
239 ```bash
240 export NOTION_API_KEY=your_integration_secret
241 knowtation import notion "page-uuid-1,page-uuid-2" --output-dir imports/notion --project myproject
242 ```
243
244 **Jira** (CSV from Jira export):
245 ```bash
246 knowtation import jira-export ./jira-export.csv --output-dir imports/jira --tags jira
247 ```
248
249 **NotebookLM** (folder of .md or JSON export):
250 ```bash
251 knowtation import notebooklm ./notebooklm-export-folder --output-dir imports/notebooklm
252 knowtation import notebooklm ./notebooklm-sources.json --project research
253 ```
254
255 **Google Drive** (folder of markdown files):
256 ```bash
257 knowtation import gdrive /path/to/docs-as-markdown --output-dir imports/gdrive --project docs
258 ```
259
260 **Linear** (CSV from Linear export):
261 ```bash
262 knowtation import linear-export ./linear-export.csv --output-dir imports/linear --project myapp
263 ```
264
265 **Audio** (transcription via OpenAI Whisper; requires `OPENAI_API_KEY`; **max ~25&nbsp;MB** per file):
266 ```bash
267 knowtation import audio ./recording.m4a --project born-free --output-dir media/audio
268 ```
269 **Video** (same pipeline and limit; prefer exporting **audio** for long content):
270 ```bash
271 knowtation import video ./short-clip.mp4 --output-dir media/video
272 ```
273
274 **Wallet / exchange CSV** โ€” one note per transaction row, format auto-detected:
275 ```bash
276 # Generic CSV with any recognized headers
277 knowtation import wallet-csv ./wallet-export.csv --tags payment,on-chain
278
279 # Coinbase (Date, Transaction Type, Asset, Quantity Transacted, โ€ฆ)
280 knowtation import wallet-csv ./coinbase-export.csv --tags coinbase,payment
281
282 # Coinbase Pro / Advanced Trade (portfolio, type, time, amount, amount/balance unit, โ€ฆ)
283 knowtation import wallet-csv ./coinbase-pro-fills.csv --tags coinbase-pro
284
285 # Exodus (DATE, TYPE, FROMAMOUNT, FROMCURRENCY, TXID, โ€ฆ)
286 knowtation import wallet-csv ./exodus-transactions.csv --tags exodus
287
288 # ICP Rosetta (hash, block_index, timestamp, type, account, amount)
289 knowtation import wallet-csv ./icp-rosetta.csv --tags icp,on-chain
290
291 # Kraken ledger export (txid, refid, time, type, aclass, asset, amount, fee, balance)
292 knowtation import wallet-csv ./kraken-ledgers.csv --tags kraken,payment
293
294 # Binance deposit/withdrawal (Date(UTC), Coin, Network, Amount, TXID, Status, โ€ฆ)
295 knowtation import wallet-csv ./binance-history.csv --tags binance
296
297 # Binance spot wallet history (UTC_Time, Account, Operation, Coin, Change, Remark)
298 knowtation import wallet-csv ./binance-spot-wallet.csv --tags binance
299
300 # MetaMask / Etherscan address export (Txhash, Blockno, DateTime (UTC), From, To,
301 # Value_IN(ETH), Value_OUT(ETH), Status, โ€ฆ)
302 knowtation import wallet-csv ./etherscan-export.csv --tags metamask,eth
303
304 # Phantom wallet (Transaction ID, Date, Type, Amount, Token, Status, Fee (SOL), Signature)
305 knowtation import wallet-csv ./phantom-history.csv --tags phantom,solana
306
307 # Ledger Live (Operation Date, Currency ticker, Operation Amount, Operation Hash, โ€ฆ)
308 knowtation import wallet-csv ./ledger-live-export.csv --tags ledger
309
310 # From Hub UI: Import modal โ†’ Source type โ†’ Wallet / exchange CSV โ†’ upload .csv file
311 ```
312
313 Notes land in `inbox/wallet-import/<YYYY-MM-DD>-<tx_hash_prefix>.md`.
314 Re-importing the same CSV is safe โ€” rows with an existing output path are skipped.
315
316 ### Named format auto-detection
317
318 The importer fingerprints the CSV header to pick the right normaliser automatically.
319 No user action required โ€” just upload/pass the CSV as-is.
320
321 | Format | Fingerprint headers | `network` set to |
322 |--------|--------------------|--------------------|
323 | **Coinbase** | `Quantity Transacted`, `Transaction Type` | `coinbase` |
324 | **Coinbase Pro** | `portfolio`, `amount/balance unit` | `coinbase-pro` |
325 | **Exodus** | `FROMAMOUNT`, `FROMCURRENCY` | _(from row)_ |
326 | **ICP Rosetta** | `hash`, `block_index` (โ‰ค10 cols) | `icp` |
327 | **Kraken** | `refid`, `aclass` or `asset` | `kraken` |
328 | **Binance deposit/withdrawal** | `Date(UTC)`, `Coin` | from `Network` column |
329 | **Binance spot wallet** | `UTC_Time`, `Coin` | `binance` |
330 | **MetaMask / Etherscan** | `Value_IN(ETH)` or `Value_OUT(ETH)` or `Blockno` | `ethereum` |
331 | **Phantom** | `Signature` or `fee (sol)`, `token` | `solana` |
332 | **Ledger Live** | `Operation Date`, `Currency ticker` | inferred from ticker |
333 | **Generic** | any CSV with recognised aliases | from `network` column |
334
335 ### Generic column alias table
336
337 For CSVs not matching a named format, the importer resolves these aliases (case-insensitive):
338
339 | Canonical field | CSV column aliases |
340 |-------------------|--------------------|
341 | `tx_hash` | `txhash`, `transaction_hash`, `hash`, `tx id`, `txid`, `transaction id`, `transaction_id` |
342 | `confirmed_at` | `date`, `timestamp`, `time`, `confirmed at`, `confirmed_at`, `block time`, `block_time` |
343 | `amount` | `amount`, `value`, `quantity` |
344 | `currency` | `currency`, `asset`, `token`, `coin`, `symbol` |
345 | `direction` | `type`, `direction`, `side` โ€” `buy`/`receive`/`deposit`/`earn` โ†’ `received`; `sell`/`send`/`withdrawal` โ†’ `sent`; `swap`/`trade` โ†’ as-is |
346 | `payment_status` | `status` โ€” `completed`/`success`/`confirmed` โ†’ `settled`; `pending` โ†’ `pending`; `failed`/`error`/`rejected` โ†’ `failed` |
347 | `wallet_address` | `from`, `to`, `address`, `wallet`, `sender`, `recipient`, `from_address`, `to_address` |
348 | `network` | `network`, `chain`, `blockchain` |
349 | `block_height` | `block`, `block number`, `block_number`, `block height`, `block_height` |
350
351 ### Example note produced
352
353 ```markdown
354 ---
355 title: ICP transfer โ€” 500 ICP sent
356 date: 2026-04-02
357 source: wallet-csv-import
358 source_id: 8a3c0d1b2e4f
359 network: icp
360 wallet_address: rrkah-fqaaa-aaaaa-aaaaq-cai
361 tx_hash: 8a3c0d1b2e4f
362 payment_status: settled
363 amount: 500
364 currency: ICP
365 direction: sent
366 confirmed_at: 2026-04-02T18:12:44Z
367 block_height: 12345678
368 tags: [payment, on-chain, icp-tx]
369 ---
370
371 Transaction imported from wallet CSV export.
372 Amount: 500 ICP | Direction: sent | Status: settled
373 Block: 12,345,678 | Confirmed: 2026-04-02 18:12:44 UTC
374 ```
375
376 **Dry run** (preview without writing):
377 ```bash
378 knowtation import markdown ./notes --dry-run --json
379 ```