gabriel / musehub public
Open #71 Bug
filed by gabriel human · 7 days ago

Implement true soft-delete for repos with cascade tombstoning

0 Anchors
Blast radius
Churn 30d
0 Proposals

Overview

There is a critical discrepancy between the API contract and the implementation for repo deletion.

The route docstring claims:

"Marks the repo as deleted by recording a deleted_at timestamp; all data is retained in the database for audit purposes."

The actual service implementation in musehub/services/musehub_repository.py::delete_repo (line 293):

async def delete_repo(session: AsyncSession, repo_id: str) -> bool:
    """Hard-delete a repo and all its cascade-deleted dependents."""
    row = await session.get(MusehubRepo, repo_id)
    if row is None:
        return False
    await session.delete(row)   # ← hard delete, not soft delete
    await session.flush()
    return True

The MusehubRepo model does have a deleted_at column (added in the consolidated schema), but it is never set by this function. Calling DELETE /repos/{repo_id} today hard-deletes the repo row and cascades to all dependents — issues, proposals, comments, collaborators, webhooks, objects — permanently and immediately.

This must be fixed before any repo deletion is exposed in the UI or invoked during the rc11 deploy cleanup, as it is irreversible.


Phase 1 — Audit the full cascade surface

Before writing any fix, map exactly what a hard delete currently destroys via DB cascade. Check the ORM model relationships and any ON DELETE CASCADE constraints in the schema:

muse -C ~/ecosystem/musehub content-grep "CASCADE|ondelete" --json
muse -C ~/ecosystem/musehub code symbols --file musehub/db/musehub_repo_models.py --json

Produce a complete list of every table/model that would be affected by deleting a MusehubRepo row. Expected candidates:

  • musehub_issues + musehub_issue_comments + musehub_issue_labels
  • musehub_proposals + musehub_proposal_comments
  • musehub_collaborators
  • musehub_webhooks + musehub_webhook_deliveries
  • musehub_objects (already has its own deleted_at)
  • musehub_branches, musehub_commits, musehub_snapshots
  • Fork relationships (fork_parent_id foreign keys)

Phase 2 — Fix delete_repo to set deleted_at instead of hard-deleting

Replace the hard delete with a tombstone:

async def delete_repo(session: AsyncSession, repo_id: str) -> bool:
    """Soft-delete a repo by recording a deleted_at timestamp.

    All associated data is retained in the database for audit purposes.
    Subsequent reads return 404. Returns True if the repo existed, False if not found.
    The caller is responsible for committing the session.
    """
    row = await session.get(MusehubRepo, repo_id)
    if row is None:
        return False
    if row.deleted_at is not None:
        return True  # idempotent — already deleted
    row.deleted_at = datetime.now(timezone.utc)
    await session.flush()
    logger.info("✅ Soft-deleted MuseHub repo %s", repo_id)
    return True

Verify that deleted_at is already on the MusehubRepo ORM model. If not, add a migration to add the column (check musehub/db/musehub_repo_models.py line 276 — it appears to be there already).


Phase 3 — Cascade tombstone to child resources

A soft-deleted repo must hide all its children. Dependents must be tombstoned in the same transaction so they are invisible in list queries without requiring a JOIN to the parent repo on every query.

Resources to tombstone when a repo is soft-deleted:

  • musehub_issues → set deleted_at = now() where repo_id = :repo_id and deleted_at IS NULL
  • musehub_issue_comments → set is_deleted = true where issue_id IN (SELECT issue_id FROM musehub_issues WHERE repo_id = :repo_id)
  • musehub_proposals → set deleted_at = now() (confirm column exists; add migration if not)
  • musehub_proposal_comments → set is_deleted = true via proposal subquery

Resources that do NOT need tombstoning (filtered by repo lookup upstream):

  • musehub_objects, musehub_branches, musehub_commits — these are already hidden once the repo row returns 404; no independent list endpoint exposes them without a repo context

Implement the cascade as explicit bulk UPDATE statements within delete_repo (not ORM-level cascade), so the tombstone is atomic with the repo's own deleted_at set in a single flush.


Phase 4 — Verify all read paths filter on deleted_at

After the tombstone is set, every read path that queries a repo must filter it out. Audit these surfaces:

muse -C ~/ecosystem/musehub content-grep "get_repo|list_repos|MusehubRepo" --json

For each query that touches MusehubRepo, confirm it includes:

MusehubRepo.deleted_at.is_(None)

Known surfaces to check:

  • musehub_repository.get_repo — is deleted_at IS NULL already in the WHERE clause?
  • musehub_repository.list_repos_for_user
  • musehub_repository.list_repo_forks / list_repo_forks_flat
  • Sitemap generator (musehub/api/routes/musehub/sitemap.py)
  • MCP write tools (musehub/mcp/write_tools/repos.py)
  • Push / fetch / clone handlers — a deleted repo must reject incoming writes

Any path that does not filter deleted_at is a data-leak bug — fix it in this phase.


Phase 5 — muse hub repo delete CLI — verify end-to-end

Once the service is fixed, verify the full CLI path:

# Create a throwaway repo
muse -C ~/ecosystem/musehub hub repo create --name test-soft-delete-$(date +%s) --visibility private --json

# Delete it
muse -C ~/ecosystem/musehub hub repo delete --json

# Confirm 404 on subsequent read
muse -C ~/ecosystem/musehub hub repo read --json   # must return 404

# Confirm row still exists in DB with deleted_at set (via SSM):
# SELECT repo_id, slug, deleted_at FROM musehub_repos WHERE repo_id = '<id>';

Idempotency check: calling delete twice must return success, not 404 on second call.


Phase 6 — Migration: add deleted_at to musehub_proposals if missing

Phase 3 assumes musehub_proposals has a deleted_at column. If it does not, write a migration:

op.add_column('musehub_proposals',
    sa.Column('deleted_at', sa.DateTime(timezone=True), nullable=True))
op.create_index('ix_musehub_proposals_deleted_at', 'musehub_proposals', ['deleted_at'])

Add the corresponding ORM column to the MusehubProposal model. Without it, proposals belonging to a soft-deleted repo remain visible in any proposal list query that does not join to musehub_repos.


Acceptance criteria

[ ] DELETE /repos/{repo_id} sets deleted_at on the repo row — no hard delete
[ ] Deleted repo returns 404 on all read paths (GET, ls-remote, push, fetch, clone)
[ ] Issues belonging to the repo are tombstoned (deleted_at set) in the same transaction
[ ] Issue comments belonging to those issues are tombstoned (is_deleted = true)
[ ] Proposals belonging to the repo are tombstoned
[ ] Proposal comments belonging to those proposals are tombstoned
[ ] All MusehubRepo list/get queries filter deleted_at IS NULL
[ ] Deleting an already-deleted repo is idempotent (returns success, not 404)
[ ] DB row is preserved — a hard delete never occurs via this path
[ ] Migration added for musehub_proposals.deleted_at if column was missing
[ ] End-to-end CLI test passes (create → delete → read returns 404 → DB row survives)
Activity
gabriel opened this issue 7 days ago
No activity yet. Use the CLI to comment.