gabriel / musehub public
feat patch phase5 task/phase1-object-store-invariant #2 / 6
AI Agent gabriel · 15 days ago · May 29, 2026 · Diff

feat(phase5): backfill script + tests for object store invariant (issue #63)

deploy/backfill_object_store.py pages through all MusehubCommit and MusehubSnapshot rows where storage_uri IS NULL and writes canonical muse binary objects to S3: - commit <size>\0<json> for each commit - snapshot <size>\0<json> for each snapshot (full manifest or reconstructed from delta chain via _reconstruct_manifest)

Fully idempotent — rows with storage_uri already set are skipped. Returns BackfillStats with written/skipped/errors counts. Supports --dry-run and --batch flags for production use.

10 tests covering: commit backfill, snapshot backfill, delta-only snapshot reconstruction, idempotency, and stats accuracy.

148 total tests GREEN.

sha256:6191353352f5187f4dc208220e243f3cf703ac322cad8018775716e9d21bce86 sha
+53 symbols
sha256:8acb06b900d9df767c61725a34978bea8e655dd34a029d219d305811fff99422 snapshot
+53
symbols added
0
dead code introduced
Semantic Changes 53 symbols
~ deploy/backfill_object_store.py .py 20 symbols added
+ BackfillStats class class BackfillStats L46–52
+ _backfill_commits function async_function _backfill_commits L82–126
+ _backfill_snapshots function async_function _backfill_snapshots L129–181
+ _main function async_function _main L184–206
+ backfill_object_store function async_function backfill_object_store L55–79
+ AsyncSession import import AsyncSession L37–37
+ MusehubCommit import import MusehubCommit L39–39
+ MusehubSnapshot import import MusehubSnapshot L39–39
+ TypedDict import import TypedDict L33–33
+ _reconstruct_manifest import import _reconstruct_manifest L41–41
+ annotations import import annotations L27–27
+ argparse import import argparse L29–29
+ asyncio import import asyncio L30–30
+ commit_to_bytes import import commit_to_bytes L40–40
+ logging import import logging L31–31
+ msgpack import import msgpack L35–35
+ sa import import sa L36–36
+ snapshot_to_bytes import import snapshot_to_bytes L40–40
+ sys import import sys L32–32
+ logger variable variable logger L43–43
+ TestBackfillCommits class class TestBackfillCommits L90–148
+ test_backfill_skips_commits_already_in_s3 method async_method test_backfill_skips_commits_already_in_s3 L131–148
+ test_commit_s3_bytes_are_muse_binary_format method async_method test_commit_s3_bytes_are_muse_binary_format L109–129
+ test_commit_storage_uri_populated_after_backfill method async_method test_commit_storage_uri_populated_after_backfill L91–107
+ TestBackfillSnapshots class class TestBackfillSnapshots L155–277
+ test_backfill_skips_snapshots_already_in_s3 method async_method test_backfill_skips_snapshots_already_in_s3 L259–277
+ test_snapshot_delta_only_reconstructed_and_backfilled method async_method test_snapshot_delta_only_reconstructed_and_backfilled L216–257
+ test_snapshot_manifest_correct_after_backfill method async_method test_snapshot_manifest_correct_after_backfill L197–214
+ test_snapshot_s3_bytes_are_muse_binary_format method async_method test_snapshot_s3_bytes_are_muse_binary_format L174–195
+ test_snapshot_storage_uri_populated_after_backfill method async_method test_snapshot_storage_uri_populated_after_backfill L156–172
+ TestBackfillStats class class TestBackfillStats L284–314
+ test_idempotent_second_run_writes_nothing method async_method test_idempotent_second_run_writes_nothing L294–314
+ test_stats_zero_when_nothing_to_backfill method async_method test_stats_zero_when_nothing_to_backfill L285–292
+ _backend function function _backend L38–40
+ _make_legacy_commit function function _make_legacy_commit L57–69
+ _make_legacy_snapshot function function _make_legacy_snapshot L72–83
+ _make_repo function async_function _make_repo L43–54
+ _uid function function _uid L34–35
+ AsyncSession import import AsyncSession L21–21
+ MusehubCommit import import MusehubCommit L25–25
+ MusehubCommitRef import import MusehubCommitRef L25–25
+ MusehubRepo import import MusehubRepo L25–25
+ MusehubSnapshot import import MusehubSnapshot L25–25
+ MusehubSnapshotRef import import MusehubSnapshotRef L25–25
+ annotations import import annotations L14–14
+ compute_identity_id import import compute_identity_id L24–24
+ compute_repo_id import import compute_repo_id L24–24
+ datetime import import datetime L17–17
+ fake_id import import fake_id L23–23
+ json import import json L16–16
+ msgpack import import msgpack L19–19
+ pytest import import pytest L20–20
+ secrets import import secrets L18–18

0 comments

No comments yet. Be the first to start the discussion.

To add a comment, use the Muse CLI: muse hub commit comment sha256:6191353352f5187f4dc208220e243f3cf703ac322cad8018775716e9d21bce86 --body "your comment"