Skip to content

feat: support out-of-order chunked uploads in Local adapter#162

Merged
TorstenDittmann merged 9 commits intomainfrom
feat-out-of-order-chunk-uploads
Apr 27, 2026
Merged

feat: support out-of-order chunked uploads in Local adapter#162
TorstenDittmann merged 9 commits intomainfrom
feat-out-of-order-chunk-uploads

Conversation

@TorstenDittmann
Copy link
Copy Markdown
Contributor

Summary

This PR enables out-of-order chunked uploads for the Local storage adapter by replacing the fragile _chunks.log file with .part.{n} files as the single source of truth.

Problem

The Local adapter previously tracked uploaded chunks via a _chunks.log file. This had several issues:

  • No duplicate protection in uploadData() — re-uploading a chunk would append duplicate log entries, inflating the count and potentially deadlocking the upload.
  • Write-before-move ordering in upload() — the log entry was written before the part file was moved. If the move failed, the log and disk were out of sync.
  • No out-of-order support verified — all existing tests uploaded chunks sequentially, so non-sequential uploads were not covered.

Solution

Local Adapter (src/Storage/Device/Local.php)

  • Removed _chunks.log entirely.
  • Added countChunks() helper that uses glob() to count existing .part.{n} files in the temp directory.
  • Fixed upload() and uploadData() to:
    • Write/move the part file first.
    • Count parts on disk after the write succeeds.
    • Trigger assembly only when the count matches $chunks.
    • Silently skip duplicate re-uploads if the .part.{n} file already exists.
  • Fixed joinChunks() to no longer attempt to delete the (now non-existent) log file.

Tests

  • testOutOfOrderUpload — uploads chunks 3→1→2 and verifies correct assembly.
  • testOutOfOrderUploadWithRetry — uploads 2→1→retry 2→3, verifying duplicates are ignored.
  • testJoinChunksMissingPartDoesNotFinalize — updated to reflect the safer new behavior (missing parts simply prevent finalization instead of throwing during assembly).
  • testOutOfOrderPartUpload (S3) — uploads all chunks of a large file in reverse order to verify S3 natively supports out-of-order multipart uploads.

Misc

  • Added .gitignore to tests/resources/disk-a/ to prevent test artifacts from being tracked.

Testing

All upload-related tests pass:

OK (9 tests, 29 assertions)

The remaining failures in the full suite (testTransferLarge, testTransferSmall, testGetFiles) are pre-existing environment issues (missing AWS credentials / macOS permissions) and are unrelated to these changes.

- Remove fragile _chunks.log file and use .part.{n} files as single source of truth
- Fix duplicate chunk detection in uploadData() to match upload()
- Write part files before counting to avoid count/desync issues
- Add countChunks() helper using glob() for reliable chunk counting
- Add out-of-order upload tests for Local and S3 adapters
- Add .gitignore to disk-a test resources to ignore test artifacts
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 27, 2026

Greptile Summary

This PR replaces the _chunks.log file in the Local adapter with .part.{n} files as the source of truth for tracking chunked uploads, enabling out-of-order uploads, fixing duplicate chunk handling, and adding glob-metacharacter escaping in countChunks(). It also adds ksort() to S3's completeMultipartUpload for out-of-order part support, upgrades the Dockerfile to PHP 8.3, and adds supporting tests.

  • P1 — double assembly race: countChunks() is called unconditionally after the duplicate-skip branch in both upload() and uploadData(). If a duplicate chunk arrives while all other parts are already on disk, joinChunks() fires again. Two concurrent workers can both observe countChunks() == $chunks and open the same $tmpAssemble handle simultaneously, producing a corrupt or empty final file before the winning rename() fires. The fix is to return early from the duplicate path without calling countChunks() or joinChunks().

Confidence Score: 3/5

Not safe to merge as-is: a race condition in the duplicate-chunk path can trigger double assembly and corrupt the output file under concurrent load.

One P1 finding (double joinChunks() race via unconditional countChunks() after duplicate detection) pulls the score below the P1 ceiling of 4. The rest of the changes are clean and well-tested.

src/Storage/Device/Local.php — both upload() and uploadData() need an early return in the duplicate branch before countChunks() is called.

Important Files Changed

Filename Overview
src/Storage/Device/Local.php Core chunked upload refactor: replaces _chunks.log with .part.{n} files; adds countChunks() with glob-metachar escaping; fixes joinChunks() cleanup. One P1: countChunks() called unconditionally after duplicate detection, enabling a double-joinChunks() race condition.
src/Storage/Device/S3.php Adds ksort($parts) before building the CompleteMultipartUpload XML body to ensure parts are ordered by number regardless of upload order; also updates @throws docblock style. Safe change.
tests/Storage/Device/LocalTest.php Adds testOutOfOrderUpload and testOutOfOrderUploadWithRetry; updates testJoinChunksMissingPartDoesNotFinalize to reflect that missing parts prevent finalization rather than throwing.
tests/Storage/S3Base.php Adds testOutOfOrderPartUpload which uploads a large file in reverse chunk order to verify S3 handles out-of-order multipart uploads; cleanup is commented out (acknowledged as existing pattern).
Dockerfile Upgrades PHP base from 8.1 to 8.3; replaces hardcoded extension dir path with php-config --extension-dir for portability; switches composer update to composer install; fixes AS casing.

Comments Outside Diff (1)

  1. src/Storage/Device/Local.php, line 86-92 (link)

    P1 countChunks() called unconditionally re-triggers assembly on duplicate

    countChunks() is invoked regardless of whether the chunk was a duplicate (lines 86–92 in upload(); lines 133–137 in uploadData()). If two concurrent workers race on the final chunk — one uploading it fresh and one re-uploading it as a duplicate — both can observe countChunks() == $chunks and both call joinChunks().

    Inside joinChunks(), both workers open $tmpAssemble with 'wb' (truncate/create), stream all parts into the same handle, and then race on rename($tmpAssemble, $path). The second truncation mid-stream of the first worker produces a corrupt or empty assembly file before the winning rename fires.

    An early return in the duplicate path eliminates the race:

    } else {
        // duplicate: count current parts but do NOT attempt assembly
        return $this->countChunks($tmp, $path);
    }

    Apply the same guard in upload().

Reviews (7): Last reviewed commit: "fix: escape glob metacharacters in both ..." | Re-trigger Greptile

Comment thread tests/Storage/S3Base.php
Comment thread src/Storage/Device/Local.php
Comment thread src/Storage/Device/Local.php
Comment thread src/Storage/Device/Local.php Outdated
Comment thread src/Storage/Device/Local.php Outdated
@TorstenDittmann TorstenDittmann merged commit 52d1f89 into main Apr 27, 2026
9 checks passed
@TorstenDittmann TorstenDittmann deleted the feat-out-of-order-chunk-uploads branch April 27, 2026 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants