IMPORT-SOURCES.md
file-level
1
files
1
commits
0
hotspots
0
๐ง dead
0
๐ฅ blast risk
| 1 | # Knowtation โ Import Sources and External Knowledge Bases |
| 2 | |
| 3 | This document specifies how to bring data and memory **into** the Knowtation vault from other platforms and devices. It covers **live capture** (messages and events โ inbox) and **batch and one-time imports** (exports and files). Capture and import both produce vault notes that conform to [SPEC ยง1โ2](./SPEC.md) (frontmatter, project, tags). The inbox contract for capture plugins is [CAPTURE-CONTRACT.md](./CAPTURE-CONTRACT.md). |
| 4 | |
| 5 | --- |
| 6 | |
| 7 | ## 1. Design principles |
| 8 | |
| 9 | - **One vault format:** Every import produces Markdown notes with our frontmatter (or MIF-compatible notes that also satisfy our schema). No second schema inside the vault. |
| 10 | - **Traceable origin:** Every imported note has `source` (e.g. `chatgpt`, `claude`, `notebooklm`) and optional `source_id` (external id) and `date` so we can dedupe and attribute. |
| 11 | - **Re-import safe:** Importers should support idempotency (e.g. skip or update when `source` + `source_id` already exists) when the platform provides stable ids. |
| 12 | - **Agent- and human-friendly:** Imported notes are searchable, filterable by `--project` / `--tag`, and usable for blogs, podcasts, analysis, and marketing like any other vault content. |
| 13 | |
| 14 | --- |
| 15 | |
| 16 | ## 2. Live capture (messages โ inbox) |
| 17 | |
| 18 | **Capture** is the real-time path: platform messages and webhooks become **inbox notes** as they arrive. **Import** (ยง3) is for exports, uploads, and CLI batch runs. Both use the same frontmatter ideas (`source`, `source_id`, `date`). |
| 19 | |
| 20 | **Hub endpoint:** `POST /api/v1/capture` with a JSON body. If the Hub has **`CAPTURE_WEBHOOK_SECRET`** set in its environment, clients must send header **`X-Webhook-Secret: <secret>`**. |
| 21 | |
| 22 | ### Capture at a glance |
| 23 | |
| 24 | | | Channel | Setup | |
| 25 | |--|---------|--------| |
| 26 | | ๐ฌ | **Slack** | Adapter default port **3132**; **`SLACK_SIGNING_SECRET`**; Slack Events API โ adapter โ Hub capture. Or Zapier/n8n: Slack trigger โ HTTP POST to capture. | |
| 27 | | ๐ฎ | **Discord** | Adapter default port **3133**; webhook or bot POST โ capture. Or Zapier/n8n โ capture. | |
| 28 | | โ๏ธ | **Telegram** | Adapter default port **3134**; Bot API webhook or simplified JSON โ capture. | |
| 29 | | ๐ฑ | **WhatsApp** | No first-party adapter; use **Zapier**, **n8n**, or similar to forward messages to **`POST /api/v1/capture`**. | |
| 30 | |
| 31 | **Scripts (self-hosted):** `scripts/capture-slack-adapter.mjs`, `capture-discord-adapter.mjs`, `capture-telegram-adapter.mjs`. Standalone local webhook: `node scripts/capture-webhook.mjs --port 3131` (see [CAPTURE-CONTRACT.md](./CAPTURE-CONTRACT.md)). |
| 32 | |
| 33 | **Minimal JSON example:** |
| 34 | |
| 35 | ```json |
| 36 | {"body": "text", "source": "slack"} |
| 37 | ``` |
| 38 | |
| 39 | **Details:** [CAPTURE-CONTRACT.md](./CAPTURE-CONTRACT.md) (plugin contract), [MESSAGING-INTEGRATION.md](./MESSAGING-INTEGRATION.md) (Slack / Discord / Telegram), [HUB-API.md](./HUB-API.md) ยง3.5 Capture. |
| 40 | |
| 41 | --- |
| 42 | |
| 43 | ## 3. Supported import sources (spec) |
| 44 | |
| 45 | The CLI command **`knowtation import <source-type> <input> [options]`** accepts the following source types. Each maps an external format to vault notes. |
| 46 | |
| 47 | **Canonical list:** The same `source_type` strings are enforced in [`lib/import-source-types.mjs`](../lib/import-source-types.mjs) for the CLI, self-hosted Hub `POST /api/v1/import`, Hub import modal, and the MCP `import` tool. If you add an importer, update that module and the table below together. |
| 48 | |
| 49 | **Manual verification:** See [IMPORT-MANUAL-CHECKLIST.md](./IMPORT-MANUAL-CHECKLIST.md). |
| 50 | |
| 51 | ### Hub browser: one upload, ZIP extraction, in-browser ZIP (4Aโ), multi-file (4B), and drop (4C) |
| 52 | |
| 53 | The Hub Import modal can send **one** multipart `file` per `POST /api/v1/import`, or **several** sequential `POST` requests in one batch. When the uploaded filename ends with **`.zip`**, the self-hosted Hub and hosted bridge **extract the archive** to a temporary directory (with zip-slip checks) and pass that **directory path** to `runImport`โthe same pattern as a folder path on the CLI. **Phase 4Aโ:** the Hub can also build a **.zip in the browser** (JSZip) for tree-shaped `source_type` values and then upload that single `file`โone HTTP round trip, same server contract. **Phase 4B:** **multiple** file selection, **โChoose folder (ZIP in browser)โ** (`webkitdirectory`), and (for **PDF, DOCX, and other single-file-per-importer** types) **sequential** imports with a combined progress/summary. **Phase 4C:** a **dashed drop zone** in the Import dialog accepts **dragged files or a folder** (Chromium: full tree via `DataTransfer` directory entries; other browsers: same as a flatter file list). Dropped content uses the same pipeline and caps as **Choose folder** and multi-file. Caps: **~100MB** per upload (multer), up to **5000** file entries in one client-built ZIP, up to **200** files in one **sequential** run; the whole in-browser zip step holds uncompressed bytes in **memory** (very large trees: zip on the desktop or use the CLI). Hosted: each request is also subject to gateway/bridge time limits (often on the order of **26s** on Netlify). |
| 54 | |
| 55 | - **Folder-capable types** (for example **`markdown`**, **`chatgpt-export`**, **`claude-export`**, **`notebooklm`** when given a directory): use a pre-made **ZIP**, **Choose folder** (4Aโ), the **4C** drop target, or **multi-select** many files to produce a client **ZIP** when the Hub decides `client_zip` mode (see `web/hub/hub-client-import-zip.mjs`). The importer walks the tree and picks up each supported file (for `markdown`, **`.md` / `.markdown`** only; other extensions are skipped). |
| 56 | - **`pdf`** and **`docx`**: the importers require a **single file** on disk and **throw if the input is a directory** (`lib/importers/pdf.mjs`, `lib/importers/docx.mjs`). **Do not** upload a server-**ZIP** for these types in the Hub. **Many PDFs or DOCX** in the Hub: **4B** = **N sequential** `POST` imports (one file per request), or the CLI: **`knowtation import pdf` / `docx`** for folder paths. |
| 57 | - **Hosted MCP:** still **one** `import` call per file (no `import_batch` tool); see [PARITY-MATRIX-HOSTED.md](./PARITY-MATRIX-HOSTED.md). |
| 58 | |
| 59 | **Reference:** [IMPORT-URL-AND-DOCUMENTS-PHASES.md](./IMPORT-URL-AND-DOCUMENTS-PHASES.md) Phases **4A**, **4Aโ**, **4B**, **4C** (4A + bulk copy, 4Aโ + 4B + 4C **shipped** on `feat/import-url-documents-mcp`). |
| 60 | |
| 61 | ### At a glance |
| 62 | |
| 63 | | | Source | Type | Format | |
| 64 | |--|--------|------|--------| |
| 65 | | ๐ค | **ChatGPT** | `chatgpt-export` | ZIP or folder export | |
| 66 | | ๐ง | **Claude** | `claude-export` | Chat + memory export | |
| 67 | | ๐พ | **Mem0** | `mem0-export` | JSON memory export | |
| 68 | | ๐ | **Notion** | `notion` | API; page IDs + key | |
| 69 | | ๐ซ | **Jira** | `jira-export` | CSV export | |
| 70 | | ๐ | **NotebookLM** | `notebooklm` | Markdown or JSON | |
| 71 | | ๐ | **Google Drive** | `gdrive` | Markdown folder | |
| 72 | | ๐ | **Generic CSV** | `generic-csv` | Any UTF-8 CSV; one note per data row | |
| 73 | | ๐งฉ | **JSON (array)** | `json-rows` | `.json` file; root must be an array of objects | |
| 74 | | ๐ | **Excel** | `excel-xlsx` | `.xlsx` (first sheet, one note per row) | |
| 75 | | ๐ค | **vCard** | `vcf` | `.vcf` / `.vcard`, one note per contact under `โฆ/contacts/vcf/` | |
| 76 | | ๐ฉ | **Google Sheets (API)** | `google-sheets` | Live read via API (spreadsheet id, not a file); see ยง below | |
| 77 | | ๐ | **Linear** | `linear-export` | CSV export | |
| 78 | | ๐ | **MIF** | `mif` | Memory Interchange Format | |
| 79 | | ๐ | **Markdown** | `markdown` | File or folder | |
| 80 | | ๐ | **PDF** | `pdf` | Single `.pdf` file (text extraction) | |
| 81 | | ๐ | **DOCX** | `docx` | Single `.docx` file (Word โ Markdown via mammoth) | |
| 82 | | ๐ | **URL** | `url` | https URL (Hub **Import from URL** / `POST /api/v1/import-url`; CLI `knowtation import url โฆ`) | |
| 83 | | ๐๏ธ | **Audio** | `audio` | Whisper transcription | |
| 84 | | ๐ฐ | **Wallet CSV** | `wallet-csv` | Tx history; 11 formats | |
| 85 | | ๐๏ธ | **Supabase** | `supabase-memory` | Memory table import | |
| 86 | | ๐ฆ | **OpenClaw** | `openclaw` | Agent memory + chats | |
| 87 | | ๐ชถ | **Hermes Agent** | `markdown` (from `~/.hermes/memories/`) | Agent MEMORY.md + USER.md | |
| 88 | | ๐ฅ | **Imports** | _(Hub UI + CLI)_ | Local files & team uploads | |
| 89 | |
| 90 | **Live inbox capture** (Slack, Discord, Telegram, WhatsApp via automation) is not a CLI `import` type: use **`POST /api/v1/capture`** and the adapters in **ยง2**. |
| 91 | |
| 92 | ### Full reference |
| 93 | |
| 94 | | Source type | Input (path or URI) | Description | |
| 95 | |-------------------|-------------------------|-------------| |
| 96 | | ๐ค `chatgpt-export` | Path to OpenAI export ZIP or folder with `conversations.json` | ChatGPT data export (Settings โ Export Data). One note per conversation or per message thread; frontmatter: `source: chatgpt`, `source_id`, `date`, optional `project`, `tags`. | |
| 97 | | ๐ง `claude-export` | Path to Claude export ZIP or folder (chat history / memory) | Claude data export (Settings โ Privacy โ Export) and/or memory export. One note per conversation or per memory entry; `source: claude`, `source_id`, `date`. | |
| 98 | | ๐พ `mem0-export` | Path to Mem0 export JSON | Mem0 memory export. One note per memory; `source: mem0`, `source_id`, `date`. | |
| 99 | | ๐ `notion` | Comma-separated Notion page IDs | Fetches pages as markdown via Notion API. Requires `NOTION_API_KEY`. One note per page; `source: notion`, `source_id: page_id`. | |
| 100 | | ๐ซ `jira-export` | Path to Jira CSV file (or folder with one .csv) | Jira Cloud/Server CSV export. One note per issue; `source: jira`, `source_id: issue key`, `title` from summary, and **`import_column_headers`** (JSON array as a string) plus a body section **All CSV fields (JSON)** with the **entire** row (every column) for search. The top of the body still foregrounds **Summary** and **Description** for quick reading. | |
| 101 | | ๐ `notebooklm` | Path to folder of .md files or to a .json export | NotebookLM: folder of markdown (e.g. from takeout/Apify) or JSON with sources/conversations array. One note per file or entry; `source: notebooklm`. | |
| 102 | | ๐ `gdrive` | Path to folder of Markdown files | Google Drive: folder of .md files (e.g. from export or pandoc). One note per file; `source: gdrive`, `source_id` from filename. | |
| 103 | | ๐ `generic-csv` | Path to a **single** `.csv` file (UTF-8; optional BOM) | **Tabular** import: first row = headers, each following row = one note. For each row, the body lists **every** column as a bullet (**no columns dropped**), a **`## Full row (JSON)`** code block with the same values (stable keys; duplicate column names get `__2`, `__3` suffixes), and frontmatter **`import_column_headers`** (JSON string of the header list). Frontmatter: `source: csv-import`, **`title`** (includes the **filename** and, if a column named **`title`**, **`name`**, **`subject`**, **`summary`**, or **`label`** has a value, that value; otherwise **`filename (row N)`** for that data row), `source_id` (from `id` / `uuid` / `key` column if present, else content hash), `csv_file`, `row_index`, `date`. The note body H1 matches **`title`**. Max **10,000** data rows, **50 MB** file, **32,000** chars per cell (cell truncation noted in JSON if applied). **Google Sheets (download):** *File โ Download โ Comma-separated values (.csv)* then import. | |
| 104 | | ๐งฉ `json-rows` | Path to a **single** `.json` file whose **root** is a **JSON array of plain objects** (not arrays inside the root array) | One note per object. Frontmatter: `source: json-import`, `source_id` (from `id`, `uuid`, or `source_id` if present, else hash of object), `json_file`, `item_index`, optional `title` (from `title` or `name` string), `date`. Body: full object in a fenced `json` code block. Max **10,000** objects, **50 MB** file. **Not** a substitute for `claude-export` / `mem0-export` (those are platform-specific shapes). | |
| 105 | | ๐ `excel-xlsx` | Path to a **single** `.xlsx` file (Office Open XML) | **Tabular** import from the **first worksheet** only, same model as `generic-csv` (header row, one note per data row, bullets + **`## Full row (JSON)`** + **`import_column_headers`**, same **`title`** rules). Parsed with **`exceljs`** (not the legacy `xlsx` / SheetJS community package). `source: xlsx-import`, `xlsx_file`, `row_index`, `date`. **Legacy** `.xls` is not supported. Max **50 MB** file, **10,000** rows, **32,000** characters per cell (truncated). | |
| 106 | | ๐ค `vcf` | Path to a **single** `.vcf` (or `.vcard`) | One note per `BEGIN:VCARD โฆ END:VCARD` block. `source: vcf-import`, `vcf_file`, `vcf_index`, `source_id` (vCard `UID` if present, else hash), `title` from `FN` when possible. Path: **`<inbox or project>/contacts/vcf/โฆ`**. Fenced raw block in each note. Max **20 MB** file, **20,000** cards. | |
| 107 | | ๐ฉ `google-sheets` | **Spreadsheet id** string (the long id in a `docs.google.com/spreadsheets/d/<id>` URL) โ not a file path for normal use | **Google Sheets API** read-only. Same tabular model as `generic-csv` (default: **first tab**, **A1:ZZ10000**; override with `sheets_range` / Hub field **Range** or CLI `--sheets-range 'Sheet1!A1:E500'`), including the same **`title`**, **bullet list**, **`## Full row (JSON)`**, and **`import_column_headers`** model as `generic-csv` (the โfileโ part of the title is the **spreadsheet id**; add a `name`/`title`/โฆ column for a readable second half; otherwise **`spreadsheet_id (row N)`**). Frontmatter: `source: google-sheets-import`, `title`, `spreadsheet_id`, `row_index`, `date`, **`import_column_headers`**. **Auth:** a **service account** JSON. Set `GOOGLE_SERVICE_ACCOUNT_JSON` (inline JSON) or `GOOGLE_APPLICATION_CREDENTIALS` (path to the key file). The spreadsheet must be **shared with the service account email (Viewer** is enough) if it is not owned by that project. The **self-hosted bridge** and **any process running `runImport` for this type** need this env. **Hub / gateway:** `POST /api/v1/import` with `source_type=google-sheets`, `spreadsheet_id`, optional `sheets_range` โ **no** `file` (multipart can omit the file part). If you only have a CSV, use **File โ Download โ Comma separated values** and `generic-csv` instead. | |
| 108 | | ๐ `linear-export` | Path to Linear CSV file | Linear workspace export (CSV). One note per issue; `source: linear`, `source_id`, `title` when the export has a title field, and **`import_column_headers`** + **All CSV fields (JSON)** in the body so **all exported columns** remain searchable (not only id / title / description). | |
| 109 | | ๐ `mif` | Path to `.memory.md` or `.memory.json` or folder of MIF files | [Memory Interchange Format](https://mif-spec.dev/). MIF is Obsidian-native; files can be copied in as-is or normalized to our frontmatter. | |
| 110 | | ๐ `markdown` | Path to file or folder of Markdown files | Generic Markdown import. Preserve or infer frontmatter; add `source: markdown`, `date` if missing. For Evernote/Standard Notes/etc. exports that are already Markdown. **Hub:** a **ZIP of a folder tree** of `.md` / `.markdown` files is supported (server extracts then walks the tree). | |
| 111 | | ๐ `pdf` | Path to a single `.pdf` file | Extracts plain text with PDF.js (via **unpdf**). One note under `inbox/imports/pdf/` (or project inbox); frontmatter: `source: pdf-import`, `source_id` (SHA-256 of file bytes), `pdf_file`, `pdf_pages`, `date`, `title`. Fails if no text can be extracted (e.g. some image-only scans). **Hub / hosted MCP:** multipart `POST /api/v1/import` or MCP **`import`** with `source_type: pdf` and file bytes (same as other file-based imports). **Hub:** upload the **`.pdf` file**, not a ZIP (ZIP is extracted to a directory; this importer requires a file). | |
| 112 | | ๐ `docx` | Path to a single `.docx` file | Converts to Markdown with **mammoth** (Office Open XML only; not binary `.doc`). One note under `inbox/imports/docx/` (or project inbox); frontmatter: `source: docx-import`, `source_id` (SHA-256 of file bytes), `docx_file`, `date`, `title`. Fails on corrupt files or empty documents. **Hub / hosted MCP:** same multipart / **`import`** pattern as PDF. **Hub:** upload the **`.docx` file**, not a ZIP (same directory-vs-file rule as PDF). | |
| 113 | | ๐ `url` | **HTTPS URL string** (not a filesystem path) | Fetches the URL server-side with SSRF protections. One note under `inbox/imports/url/` (or project inbox); frontmatter: `source: url-import`, `source_id` (hash of canonical URL), `canonical_url`, `date`, `title`. Modes: **`auto`** (extract main article HTML when possible, else bookmark), **`bookmark`** (link + metadata only), **`extract`** (requires readable article HTML or error). **Hub / hosted:** `POST /api/v1/import-url` JSON `{ "url", "mode"?, "project"?, "output_dir"?, "tags"? }`. **CLI:** `knowtation import url "https://โฆ" [--url-mode auto|bookmark|extract]`. Paywalled or bot-blocked pages: use **`bookmark`**. | |
| 114 | | ๐๏ธ `audio` | Path to audio file or URL (e.g. wearable webhook payload) | **Primary path for in-Hub transcription** (self-hosted). OpenAI Whisper; **max ~25 MB** per file. One note per file; frontmatter: `source: audio`, `source_id`, `date`. | |
| 115 | | ๐ฐ `wallet-csv` | Path to wallet/exchange transaction history CSV (or folder containing one .csv) | Converts wallet export files into vault notes with blockchain frontmatter. One note per row; `source: wallet-csv-import`, `source_id: tx_hash`, blockchain fields (`network`, `wallet_address`, `tx_hash`, `payment_status`, `amount`, `currency`, `direction`, `confirmed_at`, `block_height`). Notes land in `inbox/wallet-import/`. Auto-detects named formats: **Coinbase**, **Coinbase Pro**, **Exodus**, **ICP Rosetta**, **Kraken**, **Binance**, **MetaMask/Etherscan**, **Phantom (Solana)**, **Ledger Live**. Falls back to generic column alias matching for any other CSV. Re-import is safe: duplicate rows (same output path) are skipped. | |
| 116 | | ๐๏ธ `supabase-memory` | Supabase connection + table name | Import memory rows from a Supabase table. For users coming from database-centric stacks. | |
| 117 | | ๐ฆ `openclaw` | Path to OpenClaw data export or memory dump | Import agent conversations and memory from [OpenClaw](https://github.com/openclaw/openclaw). One note per conversation or memory entry; `source: openclaw`, `source_id`, `date`. | |
| 118 | | ๐ชถ **Hermes Agent** | `~/.hermes/memories/MEMORY.md`, `USER.md`, or `hermes memory export` JSON | [Hermes Agent](https://github.com/NousResearch/hermes-agent) built-in memory lives under `~/.hermes/memories/`. Import with `knowtation import markdown` on those files, or export via `hermes memory export` / `hermes backup` first. Tag notes `hermes` for filters. Wire ongoing access via Hub API + MCP (Settings โ Integrations). | |
| 119 | |
| 120 | > **Video:** CLI and MCP still support `knowtation import video <file>` (same Whisper pipeline as audio), but video files are usually over 25 MB. Export audio first or transcribe with another service and import as Markdown. |
| 121 | |
| 122 | **Options (common):** `--project <slug>`, `--output-dir <vault-path>`, `--tags tag1,tag2`, `--dry-run`, `--json`. **`google-sheets` only:** `--sheets-range 'A1-notation'`. If `--output-dir` is omitted, default is `vault/inbox/` or `vault/projects/<project>/inbox/` when `--project` is set. |
| 123 | |
| 124 | > **See also:** [Templates and Skills](./TEMPLATES-AND-SKILLS.md) โ starter vault templates, agent skill packs, and how they compose with import sources. |
| 125 | |
| 126 | --- |
| 127 | |
| 128 | ## 4. Platform-specific notes |
| 129 | |
| 130 | ### ๐ค 4.1 ChatGPT (OpenAI) |
| 131 | |
| 132 | - **How users get data:** Settings โ Data Controls โ Export Data (or Privacy Portal). Email link to ZIP containing `conversations.json` (and sometimes `chat.html`). Link expires in 24 hours; export can take up to 7 days. |
| 133 | - **Format:** `conversations.json` is a tree of messages (mapping of id โ { message, parent, children }). Each message has `content.parts`, `author.role`, timestamps. |
| 134 | - **Importer behavior:** Parse `conversations.json`; for each conversation, produce one note (or one per thread) with body = concatenated or structured transcript. Frontmatter: `source: chatgpt`, `source_id: <conversation-id>`, `date`, `title` from conversation title. Optional: one note per message for fine-grained search (heavier). |
| 135 | - **Third-party:** Browser extensions (e.g. ChatGPT Exporter) can export per-conversation JSON/Markdown/HTML; importer can accept a folder of such files and treat as `chatgpt-export` or `markdown` with `source: chatgpt`. |
| 136 | |
| 137 | ### ๐ง 4.2 Claude (Anthropic) |
| 138 | |
| 139 | - **How users get data:** Settings โ Privacy โ Export data (chat history). Memory: Settings โ Capabilities โ View and edit your memory โ export (and Claude supports importing memory from other AI providers via a prompt). |
| 140 | - **Format:** Export is account data (format may vary). Memory export is a user-facing list that can be copied; API or file format TBD. |
| 141 | - **Importer behavior:** Same pattern as ChatGPT: one note per conversation or per memory entry; `source: claude`, `source_id`, `date`. Third-party tools (e.g. claude-exporter) produce JSON/Markdown; we can accept that as `claude-export` or `markdown` with `source: claude`. |
| 142 | |
| 143 | ### ๐พ 4.3 Mem0 |
| 144 | |
| 145 | - **How users get data:** Mem0 API: `create_memory_export()` with schema; then retrieve via `get_memory_export()`. Returns JSON (Pydantic-style schema). |
| 146 | - **Importer behavior:** Map Mem0 memories to vault notes. Each memory โ one note; frontmatter can include Mem0 metadata (e.g. `mem0_id`, `user_id`) and our `source: mem0`, `source_id`, `date`. Optionally support MIF as output so Mem0 users can later use MIF-native tools. |
| 147 | |
| 148 | ### ๐ 4.4 Notion |
| 149 | |
| 150 | - **How users get data:** Notion API: create an integration at notion.so/my-integrations, share pages with it, then use page IDs (from the page URL: notion.so/workspace/page_id). |
| 151 | - **Importer behavior:** `knowtation import notion <page_id>` or `knowtation import notion "id1,id2,id3"`. Requires `NOTION_API_KEY`. Fetches each page as markdown via `GET /v1/pages/{page_id}/markdown`; one note per page with `source: notion`, `source_id: page_id`. |
| 152 | |
| 153 | ### ๐ซ๐ 4.5 Jira and Linear |
| 154 | |
| 155 | - **Jira:** Export from Jira (list or search โ Export CSV). Importer: `knowtation import jira-export /path/to/export.csv --output-dir imports/jira`. Maps Issue key, Summary, Description, Project to vault notes. |
| 156 | - **Linear:** Export from Linear (Command menu โ Export data โ CSV). Importer: `knowtation import linear-export /path/to/linear-export.csv --project myproject`. |
| 157 | |
| 158 | ### ๐๐ 4.6 NotebookLM and Google Drive |
| 159 | |
| 160 | - **NotebookLM:** Accepts (1) a folder of markdown files (e.g. from Google takeout or third-party Apify export), or (2) a JSON file with an array of entries (`content`, `id`, `title`). One note per file or entry; `source: notebooklm`. |
| 161 | - **Google Drive:** Accepts a folder of Markdown files. Export Docs as .docx then convert to .md (e.g. pandoc), or use a sync script. Importer: `knowtation import gdrive /path/to/folder`; `source: gdrive`, `source_id` from filename. |
| 162 | |
| 163 | ### ๐ 4.7 Confluence |
| 164 | |
| 165 | - **How users get data:** Confluence has no native markdown export. Use third-party tools (e.g. confluence-cli, nodejs-confluence-export) to export a space or page to a folder of markdown files. |
| 166 | - **Importer behavior:** Export to a folder with one of those tools, then run `knowtation import markdown /path/to/confluence-export --output-dir imports/confluence --tags confluence`. Optional: add a thin `confluence-export` importer that accepts the same folder and sets `source: confluence`, `source_id` from filename. |
| 167 | |
| 168 | ### ๐ 4.8 MIF (Memory Interchange Format) |
| 169 | |
| 170 | - **What it is:** [mif-spec.dev](https://mif-spec.dev/) โ vendor-neutral AI memory format. Dual representation: `.memory.md` (Markdown + YAML frontmatter) and `.memory.json` (JSON-LD). Obsidian-native. |
| 171 | - **Importer behavior:** Copy `.memory.md` into vault (they are already valid Obsidian notes). Optional: normalize to our frontmatter (e.g. map `mif:id` to `source_id`, add `source: mif`). No need to change body. Enables future interop with Mem0, Zep, etc. if they adopt MIF. |
| 172 | |
| 173 | ### ๐๏ธ 4.9 Audio and video (including wearables) |
| 174 | |
| 175 | - **Product note:** **Audio** is the recommended path for in-app transcription (smaller files, usually under OpenAIโs **25 MB** per-request limit). **Video** in the **self-hosted Hub** import dialog is **coming soon**; use **`knowtation import video`** from the CLI (same limit), or strip audio / transcribe elsewhere and import **Markdown**. |
| 176 | - **Smart glasses / wearables:** Devices (e.g. TranscribeGlass, Omi, Ray-Ban + GlassFlow, ViveGlass) often produce transcripts via app, webhook, or export. Omi supports webhooks for real-time transcript delivery. TranscribeGlass and similar may export text or send to a URL. |
| 177 | - **Importer behavior:** `import audio <file>` or `import video <file>` transcribes via OpenAI Whisper (`OPENAI_API_KEY` required) โ one note with transcript as body; frontmatter `source: audio` or `source: video`, `source_id`, `date`. Formats: mp3, mp4, mpeg, mpga, m4a, wav, webm. The API rejects uploads over **25 MB** (see `WHISPER_MAX_FILE_BYTES` in `lib/transcribe.mjs`). **Self-hosted:** if **ffmpeg** is available, Knowtation may transcode down first (`lib/ffmpeg-whisper-transcode.mjs`; disable via `transcription.transcode_oversized: false` or `KNOWTATION_TRANSCODE_OVERSIZED=0`). Webhook receivers (e.g. Omi) can write transcripts directly to inbox per message-interface contract. |
| 178 | - **Past blogs/videos:** User exports blog text or video transcript (or uses our transcription). Import as `markdown` or `audio`/`video` so historical content lives in the vault. |
| 179 | |
| 180 | --- |
| 181 | |
| 182 | ## 5. CLI surface (summary) |
| 183 | |
| 184 | - **Command:** `knowtation import <source-type> <input> [--project <slug>] [--output-dir <path>] [--tags t1,t2] [--dry-run] [--json]` |
| 185 | - **Behavior:** Run the importer for the given source type; write notes to vault; optionally run indexer after (config or flag). Output: list of written paths; with `--json`, machine-readable summary. |
| 186 | - **Exit codes:** 0 success, 1 usage error, 2 runtime error (same as rest of CLI). |
| 187 | |
| 188 | --- |
| 189 | |
| 190 | ## 6. What we're not forgetting |
| 191 | |
| 192 | - **Any audio:** Smart glasses, wearables, past blogs/videos, recordings โ transcription (when under **25 MB**) or external transcript โ vault note with `source` and `source_id`. |
| 193 | - **Any knowledge base:** Google Drive, NotebookLM, ChatGPT, Claude, Mem0, Evernote/Standard Notes (as Markdown), MIF โ all have a defined `import` path into the vault. |
| 194 | - **Any agent or business use:** Once in the vault, content is searchable, project/tag-filterable, and usable for blogs, podcasts, videos, marketing, analysis, writing. No second-class content. |
| 195 | |
| 196 | Implementors: follow [SPEC.md](./SPEC.md) import contracts and extend `lib/importers/` with tests. |
| 197 | |
| 198 | --- |
| 199 | |
| 200 | ## 7. How to run (examples) |
| 201 | |
| 202 | **Markdown** (file or folder): |
| 203 | ```bash |
| 204 | knowtation import markdown ./my-notes.md --output-dir imports/notes |
| 205 | knowtation import markdown ./exported-folder --project myproject |
| 206 | ``` |
| 207 | |
| 208 | **URL** (https only; server-side fetch with SSRF limits): |
| 209 | |
| 210 | ```bash |
| 211 | knowtation import url "https://example.com/article" --project research --tags reading |
| 212 | knowtation import url "https://example.com/paywalled" --url-mode bookmark --dry-run --json |
| 213 | ``` |
| 214 | |
| 215 | **Hub / hosted:** Import modal โ paste URL โ choose **URL capture mode** โ Import; or `POST /api/v1/import-url` with JSON body (same route on self-hosted Hub and hosted gateway โ bridge). |
| 216 | |
| 217 | **ChatGPT export** โ Extract the OpenAI ZIP first, then: |
| 218 | ```bash |
| 219 | knowtation import chatgpt-export /path/to/extracted-folder --output-dir imports/chatgpt --tags chatgpt |
| 220 | ``` |
| 221 | The folder must contain `conversations.json` (Settings โ Data Controls โ Export Data). |
| 222 | |
| 223 | **Claude export** โ Folder of .md files (from third-party exporters) or JSON: |
| 224 | ```bash |
| 225 | knowtation import claude-export /path/to/claude-export-folder --project myproject |
| 226 | ``` |
| 227 | |
| 228 | **MIF** (.memory.md or folder): |
| 229 | ```bash |
| 230 | knowtation import mif ./my-memories.memory.md --output-dir imports/mif |
| 231 | ``` |
| 232 | |
| 233 | **Mem0 export** (JSON file): |
| 234 | ```bash |
| 235 | knowtation import mem0-export ./mem0-export.json --project memories |
| 236 | ``` |
| 237 | |
| 238 | **Notion** (requires NOTION_API_KEY; page IDs from Notion page URLs): |
| 239 | ```bash |
| 240 | export NOTION_API_KEY=your_integration_secret |
| 241 | knowtation import notion "page-uuid-1,page-uuid-2" --output-dir imports/notion --project myproject |
| 242 | ``` |
| 243 | |
| 244 | **Jira** (CSV from Jira export): |
| 245 | ```bash |
| 246 | knowtation import jira-export ./jira-export.csv --output-dir imports/jira --tags jira |
| 247 | ``` |
| 248 | |
| 249 | **NotebookLM** (folder of .md or JSON export): |
| 250 | ```bash |
| 251 | knowtation import notebooklm ./notebooklm-export-folder --output-dir imports/notebooklm |
| 252 | knowtation import notebooklm ./notebooklm-sources.json --project research |
| 253 | ``` |
| 254 | |
| 255 | **Google Drive** (folder of markdown files): |
| 256 | ```bash |
| 257 | knowtation import gdrive /path/to/docs-as-markdown --output-dir imports/gdrive --project docs |
| 258 | ``` |
| 259 | |
| 260 | **Linear** (CSV from Linear export): |
| 261 | ```bash |
| 262 | knowtation import linear-export ./linear-export.csv --output-dir imports/linear --project myapp |
| 263 | ``` |
| 264 | |
| 265 | **Audio** (transcription via OpenAI Whisper; requires `OPENAI_API_KEY`; **max ~25 MB** per file): |
| 266 | ```bash |
| 267 | knowtation import audio ./recording.m4a --project born-free --output-dir media/audio |
| 268 | ``` |
| 269 | **Video** (same pipeline and limit; prefer exporting **audio** for long content): |
| 270 | ```bash |
| 271 | knowtation import video ./short-clip.mp4 --output-dir media/video |
| 272 | ``` |
| 273 | |
| 274 | **Wallet / exchange CSV** โ one note per transaction row, format auto-detected: |
| 275 | ```bash |
| 276 | # Generic CSV with any recognized headers |
| 277 | knowtation import wallet-csv ./wallet-export.csv --tags payment,on-chain |
| 278 | |
| 279 | # Coinbase (Date, Transaction Type, Asset, Quantity Transacted, โฆ) |
| 280 | knowtation import wallet-csv ./coinbase-export.csv --tags coinbase,payment |
| 281 | |
| 282 | # Coinbase Pro / Advanced Trade (portfolio, type, time, amount, amount/balance unit, โฆ) |
| 283 | knowtation import wallet-csv ./coinbase-pro-fills.csv --tags coinbase-pro |
| 284 | |
| 285 | # Exodus (DATE, TYPE, FROMAMOUNT, FROMCURRENCY, TXID, โฆ) |
| 286 | knowtation import wallet-csv ./exodus-transactions.csv --tags exodus |
| 287 | |
| 288 | # ICP Rosetta (hash, block_index, timestamp, type, account, amount) |
| 289 | knowtation import wallet-csv ./icp-rosetta.csv --tags icp,on-chain |
| 290 | |
| 291 | # Kraken ledger export (txid, refid, time, type, aclass, asset, amount, fee, balance) |
| 292 | knowtation import wallet-csv ./kraken-ledgers.csv --tags kraken,payment |
| 293 | |
| 294 | # Binance deposit/withdrawal (Date(UTC), Coin, Network, Amount, TXID, Status, โฆ) |
| 295 | knowtation import wallet-csv ./binance-history.csv --tags binance |
| 296 | |
| 297 | # Binance spot wallet history (UTC_Time, Account, Operation, Coin, Change, Remark) |
| 298 | knowtation import wallet-csv ./binance-spot-wallet.csv --tags binance |
| 299 | |
| 300 | # MetaMask / Etherscan address export (Txhash, Blockno, DateTime (UTC), From, To, |
| 301 | # Value_IN(ETH), Value_OUT(ETH), Status, โฆ) |
| 302 | knowtation import wallet-csv ./etherscan-export.csv --tags metamask,eth |
| 303 | |
| 304 | # Phantom wallet (Transaction ID, Date, Type, Amount, Token, Status, Fee (SOL), Signature) |
| 305 | knowtation import wallet-csv ./phantom-history.csv --tags phantom,solana |
| 306 | |
| 307 | # Ledger Live (Operation Date, Currency ticker, Operation Amount, Operation Hash, โฆ) |
| 308 | knowtation import wallet-csv ./ledger-live-export.csv --tags ledger |
| 309 | |
| 310 | # From Hub UI: Import modal โ Source type โ Wallet / exchange CSV โ upload .csv file |
| 311 | ``` |
| 312 | |
| 313 | Notes land in `inbox/wallet-import/<YYYY-MM-DD>-<tx_hash_prefix>.md`. |
| 314 | Re-importing the same CSV is safe โ rows with an existing output path are skipped. |
| 315 | |
| 316 | ### Named format auto-detection |
| 317 | |
| 318 | The importer fingerprints the CSV header to pick the right normaliser automatically. |
| 319 | No user action required โ just upload/pass the CSV as-is. |
| 320 | |
| 321 | | Format | Fingerprint headers | `network` set to | |
| 322 | |--------|--------------------|--------------------| |
| 323 | | **Coinbase** | `Quantity Transacted`, `Transaction Type` | `coinbase` | |
| 324 | | **Coinbase Pro** | `portfolio`, `amount/balance unit` | `coinbase-pro` | |
| 325 | | **Exodus** | `FROMAMOUNT`, `FROMCURRENCY` | _(from row)_ | |
| 326 | | **ICP Rosetta** | `hash`, `block_index` (โค10 cols) | `icp` | |
| 327 | | **Kraken** | `refid`, `aclass` or `asset` | `kraken` | |
| 328 | | **Binance deposit/withdrawal** | `Date(UTC)`, `Coin` | from `Network` column | |
| 329 | | **Binance spot wallet** | `UTC_Time`, `Coin` | `binance` | |
| 330 | | **MetaMask / Etherscan** | `Value_IN(ETH)` or `Value_OUT(ETH)` or `Blockno` | `ethereum` | |
| 331 | | **Phantom** | `Signature` or `fee (sol)`, `token` | `solana` | |
| 332 | | **Ledger Live** | `Operation Date`, `Currency ticker` | inferred from ticker | |
| 333 | | **Generic** | any CSV with recognised aliases | from `network` column | |
| 334 | |
| 335 | ### Generic column alias table |
| 336 | |
| 337 | For CSVs not matching a named format, the importer resolves these aliases (case-insensitive): |
| 338 | |
| 339 | | Canonical field | CSV column aliases | |
| 340 | |-------------------|--------------------| |
| 341 | | `tx_hash` | `txhash`, `transaction_hash`, `hash`, `tx id`, `txid`, `transaction id`, `transaction_id` | |
| 342 | | `confirmed_at` | `date`, `timestamp`, `time`, `confirmed at`, `confirmed_at`, `block time`, `block_time` | |
| 343 | | `amount` | `amount`, `value`, `quantity` | |
| 344 | | `currency` | `currency`, `asset`, `token`, `coin`, `symbol` | |
| 345 | | `direction` | `type`, `direction`, `side` โ `buy`/`receive`/`deposit`/`earn` โ `received`; `sell`/`send`/`withdrawal` โ `sent`; `swap`/`trade` โ as-is | |
| 346 | | `payment_status` | `status` โ `completed`/`success`/`confirmed` โ `settled`; `pending` โ `pending`; `failed`/`error`/`rejected` โ `failed` | |
| 347 | | `wallet_address` | `from`, `to`, `address`, `wallet`, `sender`, `recipient`, `from_address`, `to_address` | |
| 348 | | `network` | `network`, `chain`, `blockchain` | |
| 349 | | `block_height` | `block`, `block number`, `block_number`, `block height`, `block_height` | |
| 350 | |
| 351 | ### Example note produced |
| 352 | |
| 353 | ```markdown |
| 354 | --- |
| 355 | title: ICP transfer โ 500 ICP sent |
| 356 | date: 2026-04-02 |
| 357 | source: wallet-csv-import |
| 358 | source_id: 8a3c0d1b2e4f |
| 359 | network: icp |
| 360 | wallet_address: rrkah-fqaaa-aaaaa-aaaaq-cai |
| 361 | tx_hash: 8a3c0d1b2e4f |
| 362 | payment_status: settled |
| 363 | amount: 500 |
| 364 | currency: ICP |
| 365 | direction: sent |
| 366 | confirmed_at: 2026-04-02T18:12:44Z |
| 367 | block_height: 12345678 |
| 368 | tags: [payment, on-chain, icp-tx] |
| 369 | --- |
| 370 | |
| 371 | Transaction imported from wallet CSV export. |
| 372 | Amount: 500 ICP | Direction: sent | Status: settled |
| 373 | Block: 12,345,678 | Confirmed: 2026-04-02 18:12:44 UTC |
| 374 | ``` |
| 375 | |
| 376 | **Dry run** (preview without writing): |
| 377 | ```bash |
| 378 | knowtation import markdown ./notes --dry-run --json |
| 379 | ``` |