Decentralize MuseHub: IPFS-backed, censorship-resistant, trustless object storage
Vision
MuseHub today is a capable, content-addressed VCS server — but it runs on one AWS EC2 instance in us-east-1, stores blobs in Cloudflare R2 (or local disk), and routes everything through a Cloudflare origin. Any of those three can be taken down, rate-limited, or coerced by a state actor. That is the opposite of what a sovereign, musician-owned platform should be.
The goal of this issue is to make MuseHub structurally incapable of being censored — not just "resilient" in the cloud sense (multi-AZ, CDN edge nodes), but genuinely decentralised: no single operator can unilaterally delete a repo, silence an identity, or cut off access to committed objects.
Where We Are Now
Infrastructure single points of failure
| Layer | Current | Risk |
|---|---|---|
| Compute | EC2 i-0855d6efe7fa1a49d (us-east-1) |
AWS ToS, region outage, IAM mishap |
| TLS termination | Cloudflare Full(Strict) + Cloudflare IPs only on SG | Cloudflare can block the domain |
| Object storage | Cloudflare R2 (S3Backend) or local /data/musehub/objects |
R2 account suspended = objects gone |
| Database | Postgres on the same instance | One disk, no replicas, no HA |
| Auth | Ed25519 keys verified server-side by MuseHub | Server = trust anchor |
| Deploy pipe | AWS ECR + SSM | AWS credentials control the gate |
Code: musehub/storage/backends.py::get_backend selects R2 or local. musehub/config.py holds all the R2 credentials. Deploy is Docker via deploy/push.sh + ECR.
The Target State
Every object a musician commits should be pinned in IPFS and retrievable by CID regardless of whether musehub.ai is alive. Every identity should be verifiable on-chain. No single server should hold the only copy of a commit, a snapshot, or a release.
Proposed Layers
1. IPFS as a first-class storage backend
Add IPFSBackend alongside LocalBackend and S3Backend in musehub/storage/backends.py.
class IPFSBackend:
"""Content-addressed storage backed by IPFS (via Kubo HTTP API or web3.storage/nft.storage).
URI scheme: ipfs://<CIDv1>
Every put() pins the object and records the CID. get() uses the gateway
(configurable: local node, infura, dweb.link) with local-node fallback.
exists() = check the local pin set first, then the DHT.
"""
async def put(self, object_id: str, data: bytes) -> str: ...
async def get(self, object_id: str) -> bytes | None: ...
async def pin(self, cid: str) -> None: ... # explicit pin for persistence
The object_id → CID mapping must be persisted in musehub_objects so the DB remains the single source of truth for what a repo contains, even if the retrieval path changes.
Dual-write phase (non-breaking): Write to both R2 and IPFS on put. Read from R2 first, fall back to IPFS. This lets us migrate without a big-bang cutover.
Pinning services to integrate:
- Kubo (self-hosted,
POST /api/v0/add) - web3.storage (car upload API)
- Pinata (pinning API)
- nft.storage (for immutable releases)
Config additions in musehub/config.py:
ipfs_api_url: str | None = None # http://localhost:5001 for local Kubo
ipfs_gateway_url: str = "https://dweb.link"
ipfs_pinning_service: str | None = None # "pinata" | "web3storage" | "nftdotio"
ipfs_pinning_api_key: str | None = None
get_backend() in musehub/storage/backends.py gains a new branch:
if settings.ipfs_api_url:
return IPFSBackend()
2. Content-addressed objects already have CIDs
This is the key insight: Muse objects are already content-addressed by SHA-256. The sha256:<hex> IDs we use everywhere are CIDv1 with the sha2-256 codec. Converting to a real IPFS CID requires wrapping the bytes in a tiny UnixFS block or using raw codec (0x55):
CIDv1 = base32upper( multibase_prefix + CID_version(1) + codec(raw=0x55) + multihash(sha2-256, hash) )
We can compute the CID deterministically from object_id without hitting IPFS at all, making the mapping lossless.
Implementation lives in a new musehub/crypto/cids.py:
def object_id_to_cid(object_id: str) -> str:
"""Convert sha256:<hex> object_id to a CIDv1 raw-codec string."""
...
def cid_to_object_id(cid: str) -> str:
"""Inverse: extract the sha256 hex from a CIDv1 raw-codec CID."""
...
This means every musehub_objects.object_id has a canonical IPFS CID already — we just haven't been exposing it.
3. On-chain identity anchoring (ERC-8004 Agent Identity Standard)
MSign is great — Ed25519 keys, no passwords, no JWTs. But the trust anchor is still MuseHub's Postgres database. If the DB is wiped, all identities are gone.
Anchor public keys on the Avalanche L1. Each handle maps to an on-chain record:
handle → {ed25519_public_key, ipfs_cid_of_key_bundle, nonce, revocation_list}
Auth flow becomes:
- Client signs with Ed25519 (unchanged).
- MuseHub verifies the sig (unchanged via
musehub/auth/request_signing.py). - MuseHub additionally checks that the key is anchored on-chain (cached, refreshed on key rotation).
musehub/crypto/keys.py already designs for algorithm agility (KeyAlgorithm.ED25519, ML-DSA-65 placeholder). The on-chain record can be upgraded to post-quantum keys without breaking the DB layer — the chain holds the authoritative key, not the DB.
4. Commit manifests pinned as CAR files
Every muse push produces a bundle of objects + a commit + a snapshot manifest. That bundle should become a Content ARchive (CAR) pinned atomically to IPFS.
CAR file structure:
CARv1 header
roots: [commit_cid]
blocks:
commit_cid → msgpack(commit)
snapshot_cid → msgpack(snapshot manifest)
object_cid[0..n] → raw object bytes
This is a complete, self-verifying snapshot of a repo state at a point in time. Anyone with the root CID can reconstruct the entire commit without MuseHub.
New endpoint: GET /{owner}/{repo}/commits/{ref}/car — stream the CAR file.
New service: musehub/services/musehub_car_exporter.py
5. DHT-backed repo discovery (no DNS required)
If musehub.ai DNS is seized, users still need to find each other. IPFS's libp2p DHT can store repo advertisements:
DHT key: /musehub/repo/{owner}/{slug}
DHT value: {
hub_url: "https://musehub.ai",
fallback_peer_ids: ["12D3KooW..."],
latest_head_cid: "bafy...",
signed_by: <ed25519 sig of owner>
}
Muse CLI's push command would also publish to the DHT (behind a feature flag). muse pull would resolve from DHT if the primary hub is unreachable.
6. Federated hubs (MuseHub-to-MuseHub sync)
The protocol between Muse CLI and MuseHub is already well-defined (push, pull, filter-objects, coord). The same protocol can run hub-to-hub:
- A musician self-hosts a MuseHub instance (Docker Compose, a Raspberry Pi, a VPS).
- They designate it as a replica of their repos on staging.musehub.ai.
- On every push, MuseHub fans out to all registered replica hubs.
- Consumers can pull from any hub that has the content.
New tables:
musehub_federation_peers (
peer_id uuid primary key,
hub_url text not null,
owner_handle text, -- null = mirrors everything
last_sync_at timestamptz,
sync_status text -- 'active' | 'lagging' | 'offline'
);
New service: musehub/services/musehub_federation.py
7. Immutable releases pinned to nft.storage
musehub/services/musehub_releases.py and musehub_release_packager.py already package releases. Release artifacts (.muse-release bundles) should be pinned to nft.storage — IPFS storage backed by Filecoin, free for public data, permanent.
After pinning, record the IPFS CID in musehub_releases.ipfs_cid. The release page shows a permanent IPFS link alongside the MuseHub URL. Even if MuseHub disappears, the release is on Filecoin forever.
8. Mist domain: decentralized namespace
musehub/services/musehub_mists.py handles .muse domain registration. Centralised DNS for a decentralised VCS is a contradiction. Long-term, Mist domains should resolve on-chain:
- ENS-style:
gabriel.muse→ on-chain record → hub URL + IPFS CID of HEAD - Or: Avalanche L1 native name service, integrated with ERC-8004 agent identity
Migration Path (phased, non-breaking)
| Phase | Scope | Breaking? |
|---|---|---|
| 0 | Add musehub/crypto/cids.py — compute IPFS CIDs from existing object_ids |
No |
| 1 | IPFSBackend — dual-write to R2 + IPFS; read from R2 first |
No |
| 2 | Expose GET /{owner}/{repo}/commits/{ref}/car — CAR export endpoint |
No |
| 3 | Anchor public keys on-chain; request_signing.py does on-chain lookup with DB cache |
No |
| 4 | DHT repo advertisement in muse push (feature-flagged) |
No |
| 5 | Federation peer table + fan-out on push | No |
| 6 | nft.storage pin on musehub_releases create |
No |
| 7 | Mist domains → on-chain resolution | Gradual |
Key Files
| File | Role in decentralization |
|---|---|
musehub/storage/backends.py |
Add IPFSBackend; update get_backend() |
musehub/crypto/keys.py |
On-chain key anchoring hooks |
musehub/crypto/cids.py |
New — object_id ↔ CIDv1 conversion |
musehub/config.py |
IPFS config, pinning service config |
musehub/services/musehub_snapshot.py |
Emit CARs on snapshot write |
musehub/services/musehub_releases.py |
Pin release artifacts to nft.storage |
musehub/services/musehub_car_exporter.py |
New — CAR file generation |
musehub/services/musehub_federation.py |
New — hub-to-hub sync |
musehub/auth/request_signing.py |
On-chain key verification path |
musehub/db/musehub_models.py |
ipfs_cid columns on objects/releases |
deploy/ |
Move away from ECR+SSM; add self-hosted deploy path |
Success Criteria
- A musician can
muse pulla repo with the MuseHub server offline, using only IPFS CIDs - Every pushed object has a stable CIDv1 resolvable via
dweb.linkor any IPFS gateway - A musician can verify a commit's integrity using only the commit CID and their peer's public key — no MuseHub required
- Releases are pinned to Filecoin; the
ipfs_cidis shown on the release page - A self-hosted MuseHub instance can replicate from staging.musehub.ai
- Public keys are anchored on-chain; MuseHub is a cache, not the authority
Prior Art / References
- IPFS CAR spec
- CIDv1 raw codec
- FIPS 204 ML-DSA-65 (already stubbed in
musehub/crypto/keys.py) - ERC-8004 agent identity standard (Avalanche L1 integration context)
- web3.storage CAR upload API
- Filecoin permanent storage via nft.storage