gabriel / musehub public
Open #30
filed by gabriel human · 38 days ago

Decentralize MuseHub: IPFS-backed, censorship-resistant, trustless object storage

0 Anchors
Blast radius
Churn 30d
0 Proposals

Vision

MuseHub today is a capable, content-addressed VCS server — but it runs on one AWS EC2 instance in us-east-1, stores blobs in Cloudflare R2 (or local disk), and routes everything through a Cloudflare origin. Any of those three can be taken down, rate-limited, or coerced by a state actor. That is the opposite of what a sovereign, musician-owned platform should be.

The goal of this issue is to make MuseHub structurally incapable of being censored — not just "resilient" in the cloud sense (multi-AZ, CDN edge nodes), but genuinely decentralised: no single operator can unilaterally delete a repo, silence an identity, or cut off access to committed objects.


Where We Are Now

Infrastructure single points of failure

Layer Current Risk
Compute EC2 i-0855d6efe7fa1a49d (us-east-1) AWS ToS, region outage, IAM mishap
TLS termination Cloudflare Full(Strict) + Cloudflare IPs only on SG Cloudflare can block the domain
Object storage Cloudflare R2 (S3Backend) or local /data/musehub/objects R2 account suspended = objects gone
Database Postgres on the same instance One disk, no replicas, no HA
Auth Ed25519 keys verified server-side by MuseHub Server = trust anchor
Deploy pipe AWS ECR + SSM AWS credentials control the gate

Code: musehub/storage/backends.py::get_backend selects R2 or local. musehub/config.py holds all the R2 credentials. Deploy is Docker via deploy/push.sh + ECR.


The Target State

Every object a musician commits should be pinned in IPFS and retrievable by CID regardless of whether musehub.ai is alive. Every identity should be verifiable on-chain. No single server should hold the only copy of a commit, a snapshot, or a release.


Proposed Layers

1. IPFS as a first-class storage backend

Add IPFSBackend alongside LocalBackend and S3Backend in musehub/storage/backends.py.

class IPFSBackend:
    """Content-addressed storage backed by IPFS (via Kubo HTTP API or web3.storage/nft.storage).

    URI scheme: ipfs://<CIDv1>
    
    Every put() pins the object and records the CID. get() uses the gateway
    (configurable: local node, infura, dweb.link) with local-node fallback.
    exists() = check the local pin set first, then the DHT.
    """
    async def put(self, object_id: str, data: bytes) -> str: ...
    async def get(self, object_id: str) -> bytes | None: ...
    async def pin(self, cid: str) -> None: ...  # explicit pin for persistence

The object_id → CID mapping must be persisted in musehub_objects so the DB remains the single source of truth for what a repo contains, even if the retrieval path changes.

Dual-write phase (non-breaking): Write to both R2 and IPFS on put. Read from R2 first, fall back to IPFS. This lets us migrate without a big-bang cutover.

Pinning services to integrate:

  • Kubo (self-hosted, POST /api/v0/add)
  • web3.storage (car upload API)
  • Pinata (pinning API)
  • nft.storage (for immutable releases)

Config additions in musehub/config.py:

ipfs_api_url: str | None = None        # http://localhost:5001 for local Kubo
ipfs_gateway_url: str = "https://dweb.link"
ipfs_pinning_service: str | None = None  # "pinata" | "web3storage" | "nftdotio"
ipfs_pinning_api_key: str | None = None

get_backend() in musehub/storage/backends.py gains a new branch:

if settings.ipfs_api_url:
    return IPFSBackend()

2. Content-addressed objects already have CIDs

This is the key insight: Muse objects are already content-addressed by SHA-256. The sha256:<hex> IDs we use everywhere are CIDv1 with the sha2-256 codec. Converting to a real IPFS CID requires wrapping the bytes in a tiny UnixFS block or using raw codec (0x55):

CIDv1 = base32upper( multibase_prefix + CID_version(1) + codec(raw=0x55) + multihash(sha2-256, hash) )

We can compute the CID deterministically from object_id without hitting IPFS at all, making the mapping lossless.

Implementation lives in a new musehub/crypto/cids.py:

def object_id_to_cid(object_id: str) -> str:
    """Convert sha256:<hex> object_id to a CIDv1 raw-codec string."""
    ...

def cid_to_object_id(cid: str) -> str:
    """Inverse: extract the sha256 hex from a CIDv1 raw-codec CID."""
    ...

This means every musehub_objects.object_id has a canonical IPFS CID already — we just haven't been exposing it.

3. On-chain identity anchoring (ERC-8004 Agent Identity Standard)

MSign is great — Ed25519 keys, no passwords, no JWTs. But the trust anchor is still MuseHub's Postgres database. If the DB is wiped, all identities are gone.

Anchor public keys on the Avalanche L1. Each handle maps to an on-chain record:

handle → {ed25519_public_key, ipfs_cid_of_key_bundle, nonce, revocation_list}

Auth flow becomes:

  1. Client signs with Ed25519 (unchanged).
  2. MuseHub verifies the sig (unchanged via musehub/auth/request_signing.py).
  3. MuseHub additionally checks that the key is anchored on-chain (cached, refreshed on key rotation).

musehub/crypto/keys.py already designs for algorithm agility (KeyAlgorithm.ED25519, ML-DSA-65 placeholder). The on-chain record can be upgraded to post-quantum keys without breaking the DB layer — the chain holds the authoritative key, not the DB.

4. Commit manifests pinned as CAR files

Every muse push produces a bundle of objects + a commit + a snapshot manifest. That bundle should become a Content ARchive (CAR) pinned atomically to IPFS.

CAR file structure:

CARv1 header
  roots: [commit_cid]
blocks:
  commit_cid        → msgpack(commit)
  snapshot_cid      → msgpack(snapshot manifest)
  object_cid[0..n] → raw object bytes

This is a complete, self-verifying snapshot of a repo state at a point in time. Anyone with the root CID can reconstruct the entire commit without MuseHub.

New endpoint: GET /{owner}/{repo}/commits/{ref}/car — stream the CAR file. New service: musehub/services/musehub_car_exporter.py

5. DHT-backed repo discovery (no DNS required)

If musehub.ai DNS is seized, users still need to find each other. IPFS's libp2p DHT can store repo advertisements:

DHT key:   /musehub/repo/{owner}/{slug}
DHT value: {
  hub_url: "https://musehub.ai",
  fallback_peer_ids: ["12D3KooW..."],
  latest_head_cid: "bafy...",
  signed_by: <ed25519 sig of owner>
}

Muse CLI's push command would also publish to the DHT (behind a feature flag). muse pull would resolve from DHT if the primary hub is unreachable.

6. Federated hubs (MuseHub-to-MuseHub sync)

The protocol between Muse CLI and MuseHub is already well-defined (push, pull, filter-objects, coord). The same protocol can run hub-to-hub:

  • A musician self-hosts a MuseHub instance (Docker Compose, a Raspberry Pi, a VPS).
  • They designate it as a replica of their repos on staging.musehub.ai.
  • On every push, MuseHub fans out to all registered replica hubs.
  • Consumers can pull from any hub that has the content.

New tables:

musehub_federation_peers (
  peer_id        uuid primary key,
  hub_url        text not null,
  owner_handle   text,  -- null = mirrors everything
  last_sync_at   timestamptz,
  sync_status    text   -- 'active' | 'lagging' | 'offline'
);

New service: musehub/services/musehub_federation.py

7. Immutable releases pinned to nft.storage

musehub/services/musehub_releases.py and musehub_release_packager.py already package releases. Release artifacts (.muse-release bundles) should be pinned to nft.storage — IPFS storage backed by Filecoin, free for public data, permanent.

After pinning, record the IPFS CID in musehub_releases.ipfs_cid. The release page shows a permanent IPFS link alongside the MuseHub URL. Even if MuseHub disappears, the release is on Filecoin forever.

8. Mist domain: decentralized namespace

musehub/services/musehub_mists.py handles .muse domain registration. Centralised DNS for a decentralised VCS is a contradiction. Long-term, Mist domains should resolve on-chain:

  • ENS-style: gabriel.muse → on-chain record → hub URL + IPFS CID of HEAD
  • Or: Avalanche L1 native name service, integrated with ERC-8004 agent identity

Migration Path (phased, non-breaking)

Phase Scope Breaking?
0 Add musehub/crypto/cids.py — compute IPFS CIDs from existing object_ids No
1 IPFSBackend — dual-write to R2 + IPFS; read from R2 first No
2 Expose GET /{owner}/{repo}/commits/{ref}/car — CAR export endpoint No
3 Anchor public keys on-chain; request_signing.py does on-chain lookup with DB cache No
4 DHT repo advertisement in muse push (feature-flagged) No
5 Federation peer table + fan-out on push No
6 nft.storage pin on musehub_releases create No
7 Mist domains → on-chain resolution Gradual

Key Files

File Role in decentralization
musehub/storage/backends.py Add IPFSBackend; update get_backend()
musehub/crypto/keys.py On-chain key anchoring hooks
musehub/crypto/cids.py New — object_id ↔ CIDv1 conversion
musehub/config.py IPFS config, pinning service config
musehub/services/musehub_snapshot.py Emit CARs on snapshot write
musehub/services/musehub_releases.py Pin release artifacts to nft.storage
musehub/services/musehub_car_exporter.py New — CAR file generation
musehub/services/musehub_federation.py New — hub-to-hub sync
musehub/auth/request_signing.py On-chain key verification path
musehub/db/musehub_models.py ipfs_cid columns on objects/releases
deploy/ Move away from ECR+SSM; add self-hosted deploy path

Success Criteria

  • A musician can muse pull a repo with the MuseHub server offline, using only IPFS CIDs
  • Every pushed object has a stable CIDv1 resolvable via dweb.link or any IPFS gateway
  • A musician can verify a commit's integrity using only the commit CID and their peer's public key — no MuseHub required
  • Releases are pinned to Filecoin; the ipfs_cid is shown on the release page
  • A self-hosted MuseHub instance can replicate from staging.musehub.ai
  • Public keys are anchored on-chain; MuseHub is a cache, not the authority

Prior Art / References

Activity
gabriel opened this issue 38 days ago
No activity yet. Use the CLI to comment.