Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(s3): upload file bigger than 50GB #7067

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from
Draft

Conversation

fbeauchamp
Copy link
Collaborator

@fbeauchamp fbeauchamp commented Oct 2, 2023

Description

chunk size computation

Chunk size is (length ?? maxLength ) / 10 000 . (min 5MB, max 500MB).

can be computed from disk utilization (for an xva export) or disk size (for a mirror backup)

Transfer strategy

  1. direct transfer : non buffering, only for tar like format , only for bucket without object lock : memory consumption is constant ( not related to file size). Will use at most one chunk in additional size on disk
  2. direct transfer to temporary file : same as precedent + an additional copy step that truncate the padding. memory consumption is constant . Does not work with backup repository encryption (can't truncate file easily)
  3. buffering : use 1 chunk of additional memory ,read one chunk, md5 it, then transfer it => slower than precedent, no temporary file
  4. double buffering : use 2 chunk ; read 1 chunk , start transfering and reading the next chunk => faster than precedent
  5. default mode of lib-storage

I think we should use 2 on non object locked, non encrypted backup repository, and 5 in other. A file configuration should be present to force the fallback to lib-storage

To do

  • implement sensible maxLenght/lengh computation
  • use it to compute PartSize, throw an error if stream is bigger than 10K PartSize
  • use md5 middleware only if Backup repository has object lock
  • use direct transfer as a fast path it BR is not encrypted AND not object locked

Checklist

  • Commit
    • Title follows commit conventions
    • Reference the relevant issue (Fixes #007, See xoa-support#42, See https://...)
    • If bug fix, add Introduced by
  • Changelog
    • If visible by XOA users, add changelog entry
    • Update "Packages to release" in CHANGELOG.unreleased.md
  • PR
    • If UI changes, add screenshots
    • If not finished or not tested, open as Draft

also Put object part with a prevalculated md5 and size doesn'c consume additionnal memory against presigned + raw upload
Comment on lines 232 to 234
if (this._readCounter === undefined) {
this._readCounter = 0
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a closure (variable outside of the function) and initialize it to 0.

region = guessAwsRegion(host),
} = parse(url)
const client = new S3Client({
apiVersion: '2006-03-01',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really what we want?

keepAlive: true,
}),
}),
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we are not passing tls option, is this necessary in our S3 implementation?

@xen-orchestra/fs/testupload.mjs Outdated Show resolved Hide resolved
Comment on lines 111 to 113
while((data = await read(inputStream, Math.min(remainingBytes, maxChunkPartSize))) !== null){
remainingBytes = await write(data, chunkStream, remainingBytes)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a loop? Doesn't the first read give you everything?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for padding, but we should use a TransformStream instead for better perf (less buffering).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants