Skip to content

Feat-large-file-s3#17

Merged
lohanidamodar merged 11 commits intofeat-large-file-supportfrom
feat-large-file-s3
Nov 4, 2021
Merged

Feat-large-file-s3#17
lohanidamodar merged 11 commits intofeat-large-file-supportfrom
feat-large-file-s3

Conversation

@lohanidamodar
Copy link
Copy Markdown
Contributor

@lohanidamodar lohanidamodar commented Sep 10, 2021

Large file upload in S3 using multipart upload

  1. The interface is still the same.
  2. MultipartUpload to S3 is handled internally

Open Question

  1. Should we expose abortMultipartUpload, if user doesn't want to complete or while clearing incomplete multipart upload, user needs to call abortMultipartUpload to remove the parts that are already uploaded otherwise they will be charged for those storage until they abort?
  2. Handling if some of the part fails uploading, do we handle internally?
  3. S3 has a limitation that each chunk must be at least 5MB (except for last), chunks smaller than 5MB is not allowed for multipart upload, so we need to handle this in Appwrite.

@lohanidamodar lohanidamodar changed the base branch from main to feat-large-file-support September 10, 2021 08:05
Comment thread src/Storage/Device/S3.php Outdated
$data = \file_get_contents($source);
$this->headers['date'] = \gmdate('D, d M Y H:i:s T');
$this->headers['content-type'] = \mime_content_type($source);
$this->headers['content-md5'] = \base64_encode(md5($data, true)); //TODO whould this work well with big file? can we skip it?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//TODO whould this work well with big file? can we skip it?

Found this in the comments of md5() docs:

If you want to hash a large amount of data you can use the hash_init/hash_update/hash_final functions.
This allows you to hash chunks/parts/incremental or whatever you like to call it.

https://www.php.net/manual/en/function.md5.php#94680

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but I think we can skip this for now, as we will be using small chunks for uploading. Any files above 5 MB will be chunked!

// More info on how AWS calculates ETag for multipart upload here
// https://savjee.be/2015/10/Verifying-Amazon-S3-multi-part-uploads-with-ETag-hash/
// TODO
// $this->assertEquals(\md5_file($source), $this->object->getFileHash($dest));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we address this TODO before merging?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can, and may be we don't need to. For every chunk upload we are passing the md5 for the chunk, which is validated otherwise the part upload fails. And if all parts are uploaded correctly we can validate that, the overall data is uploaded correctly right?

Also, don't know how to address this, AWS calculates MD5 differently for multipart upload, hashes the hash of individual parts or something. So the original file hash will not match. If you have an idea do share.

Comment thread src/Storage/Device/S3.php
$uri = ($path !== '') ? '/' . \str_replace('%2F', '/', \rawurlencode($path)) : '/';
if($length !== null) {
$end = $offset + $length - 1;
$this->headers['range'] = "bytes=$offset-$end";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the intended string value for the range header?
['range' => 'bytes=$offset-$end']

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is. If reading a chunk the range will provide which part of the file was read.

Comment thread src/Storage/Device/S3.php Outdated
$this->headers['content-type'] = \mime_content_type($source);
$this->headers['content-md5'] = \base64_encode(md5($data, true)); //TODO whould this work well with big file? can we skip it?
$this->amzHeaders['x-amz-content-sha256'] = \hash('sha256', $data);
unset($this->amzHeaders['x-amz-acl']);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we unset this header?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated answer in comment.

@lohanidamodar lohanidamodar merged commit baa21c8 into feat-large-file-support Nov 4, 2021
@lohanidamodar lohanidamodar deleted the feat-large-file-s3 branch November 4, 2021 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants