perf: parallel pack uploads — collapse M×T_pack to ≈T_pack wall-clock time
_push_objects_as_packs previously sent packs serially: pack 1 completes, then pack 2 starts, etc. For 2000 files (8 packs × 250 objects), that was 8 × 19s = 152s if each pack took 19s. After the server-side fixes, each pack takes ~1.5s — but serial = 8 × 1.5s = 12s while parallel = 1.5s.
Changes: - Build all packs in memory first (list[list[ObjectPayload]]) - Submit all packs simultaneously via ThreadPoolExecutor(_PACK_WORKERS=8) - Collect results with as_completed — re-raises TransportError on any failure - HttpTransport.push_object_pack is stateless per call (new urllib request each time), so concurrent calls from multiple threads are safe
_PACK_WORKERS = 8: covers a 2000-file repo in one wave (8 × 250 = 2000). Higher values give diminishing returns — bottleneck is server R2 pool (100).
PACK_MAX_OBJECTS stays at 250 (client conservative, 60s nginx timeout safety).
Combined with the server-side fixes (head_object removal, pool=100, shared semaphore), expected 2000-file push: ~1.5s total vs 78s before.
0 comments
muse hub commit comment sha256:03ab12d64c92db7b08c39a0f30b68244629bc2fa7c379b85ba3d37bf1fa4bcac --body "your comment"
No comments yet. Be the first to start the discussion.