muse code port: semantic cross-language porting engine
Vision
Every time a codebase is rewritten in a new language, the same tragedy plays out: engineers spend weeks manually reading Python (or whatever the source is), trying to understand what a function actually does at a semantic level, then transcribing it into Rust — re-discovering edge cases that the original author knew but never documented, missing invariants embedded in call ordering, and breaking contracts that existed only as tribal knowledge.
Muse already understands code at a level that no VCS has ever attempted. It has:
- A multi-language AST parser (
muse/plugins/code/ast_parser.py) that extractsSymbolRecordtrees —content_id,body_hash,signature_id— for Python, Rust, TypeScript, Go, Java, C, C++, Swift, Kotlin, Ruby, and more - A call graph (
muse/plugins/code/_callgraph.py) — both forward (ForwardGraph: what does this function call?) and reverse (ReverseGraph: what calls this?) - Transitive blast-radius analysis (
muse/cli/commands/impact.py) — the full dependency closure of any symbol - Type-health analysis (
muse/core/type_analysis.py) — annotation coverage,Anypropagation chains, signature drift over history - Symbol diff (
muse/plugins/code/symbol_diff.py) — rename vs. move vs. signature change vs. implementation change, all structurally classified - Refactor detection (
muse/cli/commands/detect_refactor.py) — semantic operation history across commits - A persistent symbol cache (
muse/core/symbol_cache.py) —sha256 → SymbolTreein msgpack, 60× faster than re-parsing - Dead code detection (
muse/cli/commands/dead.py), coupling analysis (coupling.py), gravity/centrality (gravity.py), hotspots (hotspots.py) - A query engine (
muse/core/query_engine.py) — walk history, evaluate predicates, extract matches across commits
muse code port is the natural next command that emerges from this stack. It answers: "Given that I want to rewrite this codebase in another language, what exactly needs to happen, in what order, and where are the hard parts?"
The Command
muse code port --from Python --to Rust [path/to/file.py] [flags]
What it produces
Phase 1 — Inventory
A complete, ordered porting manifest. Not a flat list of files — a topologically sorted work queue based on the dependency graph:
muse code port --from Python --to Rust
Porting inventory: Python → Rust
Snapshot: a3f2c9e1 497 files 14 642 symbols
Phase 1 — Leaf modules (no internal deps, port first)
muse/core/errors.py 12 symbols 0 internal deps ██░░░░ typed 42%
muse/core/validation.py 8 symbols 0 internal deps █████░ typed 83%
muse/core/_types.py 31 symbols 0 internal deps ██████ typed 100%
Phase 2 — Core layer (depends only on Phase 1)
muse/core/object_store.py 24 symbols 3 deps ████░░ typed 67%
muse/core/schema.py 18 symbols 2 deps ██░░░░ typed 31%
muse/plugins/code/ast_parser.py 89 symbols 5 deps ░░░░░░ typed 12%
...
Phase 8 — CLI layer (depends on everything)
muse/cli/commands/impact.py 11 symbols 42 deps ███░░░ typed 55%
────────────────────────────────────────────────────────────
Total: 8 phases 497 files 14 642 symbols
Estimated porting complexity: 847 symbol-days (see --complexity)
Phase 2 — Symbol-by-symbol translation guidance
For each symbol, emit a structured translation card:
muse code port --from Python --to Rust muse/core/object_store.py::read_object
Symbol: muse/core/object_store.py::read_object
Kind: async_function → Rust async fn
Type: (object_id: str, repo_path: Path) -> bytes | None
Typed: Yes (100%)
Callers: 47 (blast radius depth 1)
Calls: _read_local, _read_s3, decompress_lz4, validate_object_id (4 direct)
Rust translation hints:
str → &str or String (choose &str for borrowed, String if stored)
Path → std::path::Path / PathBuf
bytes | None → Option<Vec<u8>>
async → tokio::main or async fn in async context
LZ4 decompression → lz4_flex crate
Invariants detected (from call history + type analysis):
- object_id always matches ^sha256:[0-9a-f]{64}$ (validated at 3 call sites)
- Never returns empty Vec — callers assume None means absent, b"" is invalid
- read_object is called 47× but _read_s3 is only reached when R2_BUCKET is set
Risks:
- HIGH: async error handling style differs. Python returns None on S3 botocore
exceptions; Rust equivalent should use Result<Option<Vec<u8>>, StorageError>
- MED: Path::resolve() has different symlink semantics on Linux vs macOS
- LOW: LZ4 block format vs frame format — verify crate compatibility
Phase 3 — Port progress tracking
Once you start creating .rs files alongside the originals, muse code port tracks progress:
muse code port --status
Python → Rust port progress
────────────────────────────────────
Phase 1 ████████████████████ 100% 12/12 symbols
Phase 2 ████████░░░░░░░░░░░░ 42% 18/43 symbols
Phase 3 ░░░░░░░░░░░░░░░░░░░░ 0% 0/89 symbols
...
────────────────────────────────────
Total: 30/497 files (6.0%)
Next recommended: muse/core/schema.py (8 symbols, all Phase 2 deps satisfied)
Phase 4 — Semantic equivalence verification
After porting a symbol, verify semantic equivalence across language boundary:
muse code port --verify muse/core/object_store.py::read_object \
--against src/object_store.rs::read_object
Semantic diff: Python read_object ↔ Rust read_object
Signature match: ✅ (str/&str, Path/PathBuf, Option<Vec<u8>>)
Invariant coverage: ⚠️ Python validates object_id format; Rust version does not
Error paths: ❌ Python swallows S3 errors (returns None); Rust propagates Err
Async model: ✅ Both async
Complexity: Python cyclomatic 7, Rust cyclomatic 5 (simplified)
Architecture
New command entrypoint
muse/cli/commands/port.py
Registered under muse code port alongside the existing impact, deps, dead, etc. The registration point is the CodePlugin in muse/plugins/code/plugin.py.
Core engine
muse/plugins/code/_port_engine.py
class PortPlan(TypedDict):
phases: list[PortPhase]
total_files: int
total_symbols: int
source_lang: str
target_lang: str
class PortPhase(TypedDict):
phase: int
files: list[PortFileEntry]
all_deps_satisfied_by: list[int] # earlier phase numbers
class PortFileEntry(TypedDict):
path: str
symbols: int
internal_deps: list[str] # other files in the repo it imports
type_coverage_pct: float
complexity_score: float # cyclomatic complexity aggregate
def build_port_plan(
manifest: Manifest,
symbol_cache: SymbolCache,
source_lang: str,
target_lang: str,
) -> PortPlan: ...
def symbol_translation_card(
address: str,
symbol: SymbolRecord,
reverse_graph: ReverseGraph,
forward_graph: ForwardGraph,
type_map: AnnotationMap,
target_lang: str,
) -> SymbolTranslationCard: ...
Topological sort (why it matters)
The key primitive is already partially implemented via muse/plugins/code/_callgraph.py (ForwardGraph) and muse/cli/commands/deps.py (import graph). build_port_plan extends this:
- Build the file-level import DAG (already available via
deps) - Topological-sort the DAG into phases — leaf nodes (no internal imports) first
- Within each phase, sort by symbol count ascending (cheapest first for quick wins)
- Surface type coverage and cyclomatic complexity as per-file porting cost signals
This is the right order to port a codebase — it ensures that when you implement muse/core/object_store.rs, all its dependencies are already ported. Git has no concept of this ordering. It cannot even tell you the import graph.
Language type mapping
muse/plugins/code/_type_maps.py
PYTHON_TO_RUST: dict[str, str] = {
"str": "&str / String",
"int": "i64 / usize",
"float": "f64",
"bool": "bool",
"bytes": "Vec<u8> / &[u8]",
"None": "()",
"X | None": "Option<X>",
"list[X]": "Vec<X>",
"dict[K,V]": "HashMap<K, V>",
"set[X]": "HashSet<X>",
"tuple[X,Y]": "(X, Y)",
"Path": "std::path::PathBuf",
"Exception": "Box<dyn std::error::Error>",
"Callable": "fn() / Box<dyn Fn()>",
"Iterator": "impl Iterator<Item=X>",
"AsyncIterator": "impl Stream<Item=X>", # futures::Stream
}
PYTHON_TO_TYPESCRIPT: dict[str, str] = { ... }
PYTHON_TO_GO: dict[str, str] = { ... }
Invariant extraction
muse/plugins/code/_invariant_extractor.py
Mining invariants from the call graph and type history — things that are enforced at runtime but not in the type signature — is the hardest and most valuable part:
- Regex-pattern invariants: scan callers of a function for
re.match(pattern, arg)guards before the call — surface as "arg always matchespattern" - Nullability invariants: trace None-check patterns; if callers always guard before using the result, document as "may return None — callers must check"
- Ordering invariants: detect when function A is always called before function B (via call-graph adjacency in
ForwardGraph) — surface as "call ordering dependency" - Historical invariants: use
walk_historyfrommuse/core/query_engine.pyto scan commit messages forassert,invariant,precondition,contractkeywords near the symbol
Cross-language semantic diff
muse/plugins/code/_cross_lang_diff.py
Compare a source-language symbol and its claimed equivalent in the target language:
- Signature mapping: apply type map, flag mismatches
- Error handling model: Python
raise/Nonevs RustResult/?vs Goerror— detect the mismatch - Async model: Python
asynciovs Rusttokio/async-stdvs Go goroutines - Ownership model: Python GC vs Rust borrow checker hints (lifetime annotations needed?)
- Complexity comparison: cyclomatic complexity per symbol from AST walk — if Rust version is significantly higher/lower, flag for review
Port Progress as a First-Class VCS Concept
The really powerful insight: porting progress should live in Muse itself, not in a spreadsheet.
When you create src/object_store.rs as the port of muse/core/object_store.py, you annotate the relationship:
muse code port --link muse/core/object_store.py::read_object \
--to src/object_store.rs::read_object
This writes a .muse/port-map.toml file:
[port_map]
source_lang = "Python"
target_lang = "Rust"
[[port_map.links]]
source = "muse/core/object_store.py::read_object"
target = "src/object_store.rs::read_object"
status = "ported" # ported | in-progress | needs-review | verified
verified_at = "a3f2c9e1" # commit at which equivalence was last verified
Now muse code port --status reads the port map and gives you a live progress dashboard. And muse code port --verify reruns semantic diff against the verified commit, flagging if the source has since drifted (new callers, changed signature, added invariants).
This is port-aware history. If read_object gets a new parameter added in commit deadbeef, muse code port --drift immediately surfaces:
PORT DRIFT DETECTED
muse/core/object_store.py::read_object
Last verified: a3f2c9e1
Current: deadbeef
Changes since verification:
+ parameter: compress: bool = False (added in commit deadbeef)
Callers updated: 12 sites
Rust equivalent: src/object_store.rs::read_object — NOT YET UPDATED
Action: re-port or extend the Rust function to match new signature
The Python → Rust Port of Muse Itself
This feature is also the roadmap for rewriting Muse in Rust. Running muse code port --from Python --to Rust against the muse repo would output:
- Phase 1 (leaf modules, ~40 files):
muse/core/errors.py,muse/core/validation.py,muse/core/_types.py,muse/core/schema.py,muse/core/semver.py,muse/core/bip39.py - Phase 2 (~60 files):
muse/core/object_store.py,muse/core/snapshot.py,muse/core/dag.py,muse/core/identity.py,muse/core/msign.py,muse/core/keypair.py - Phase 3 (~80 files):
muse/core/repo.py,muse/core/store.py,muse/core/merge_engine.py,muse/core/rebase.py,muse/core/gc.py - Phase 4 (~50 files):
muse/plugins/code/ast_parser.py(tree-sitter — native Rust tree-sitter already exists),muse/plugins/code/_callgraph.py,muse/plugins/code/_query.py - Phase 5+: CLI layer —
muse/cli/commands/(~170 files, high porting complexity due to argparse → clap migration)
The first deliverable of muse code port is its own porting plan.
Key Files and Symbols
| File | Role |
|---|---|
muse/plugins/code/plugin.py |
Register port subcommand in CodePlugin |
muse/plugins/code/ast_parser.py::SymbolRecord |
Source of content_id, body_hash, signature_id per symbol |
muse/plugins/code/ast_parser.py::LanguageAdapter |
Protocol all language backends implement — extend for cross-lang type mapping |
muse/plugins/code/_callgraph.py::ForwardGraph |
"What does this function call?" — foundation for port-order topological sort |
muse/plugins/code/_callgraph.py::ReverseGraph |
"What calls this?" — surfaces blast radius of any port decision |
muse/plugins/code/_callgraph.py::build_reverse_graph |
Entry point for building the full call graph |
muse/plugins/code/_query.py::symbols_for_snapshot |
Extract full SymbolTree from a committed snapshot |
muse/plugins/code/_query.py::language_of |
Language classification by file extension — already covers Python + Rust + 15 others |
muse/core/symbol_cache.py::SymbolCache |
60× faster symbol extraction via content-addressed cache |
muse/core/type_analysis.py |
Annotation coverage, Any-blast-radius, migration targets — feeds type-coverage column in port plan |
muse/core/query_engine.py::walk_history |
Walk commit DAG for invariant mining |
muse/cli/commands/deps.py |
File-level import graph — foundation for phase topology |
muse/cli/commands/impact.py |
Transitive blast radius — muse code port reuses this for "porting this symbol forces porting these N others" |
muse/cli/commands/detect_refactor.py |
Semantic operation history — can detect if a Python symbol was already partially Rust-ified |
muse/cli/commands/dead.py |
Dead code detection — symbols with no callers can be deprioritized in the port plan |
muse/cli/commands/breakage.py |
After porting, run breakage check on the new language target |
muse/core/invariants.py |
Existing invariant infrastructure — extend for cross-language invariant extraction |
muse/cli/commands/port.py |
New file — the muse code port command entrypoint |
muse/plugins/code/_port_engine.py |
New file — PortPlan, PortPhase, PortFileEntry, build_port_plan |
muse/plugins/code/_type_maps.py |
New file — PYTHON_TO_RUST, PYTHON_TO_TYPESCRIPT, PYTHON_TO_GO |
muse/plugins/code/_invariant_extractor.py |
New file — regex/nullability/ordering invariant mining |
muse/plugins/code/_cross_lang_diff.py |
New file — semantic equivalence verification across language boundary |
Success Criteria
muse code port --from Python --to Rustproduces a topologically-sorted porting plan for the muse repo itself- Each symbol card includes: callers, callees, type mapping, detected invariants, risk flags
muse code port --linkwrites.muse/port-map.tomland--statusreads itmuse code port --verifyruns semantic diff between source and target symbolmuse code port --driftdetects when source has changed since last verification- Port plan respects
--languagefilter (port onlymuse/core/first, not all 497 files) - Works for Python→Rust, Python→Go, Python→TypeScript, and any language pair where both sides have AST adapters in
ast_parser.py muse code portrun on the muse repo produces a coherent Phase 1–8 ordering that a Rust engineer could execute sequentially with zero ambiguity
Prior Art
c2rust— mechanical C-to-Rust transpiler; produces unsafe Rust; no semantic understandingpy2rust— abandoned; no call-graph or invariant awareness- LLM-assisted rewrites (GPT-4, Claude) — hallucinate invariants, miss call ordering, no progress tracking
- Muse's
muse code port— the first porting tool that understands the semantic topology of a codebase and produces a dependency-ordered, invariant-annotated, progress-tracked porting plan grounded entirely in the committed symbol graph