New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic chunk size #122

Merged
merged 3 commits into from Aug 28, 2017

Conversation

Projects
None yet
3 participants
@juanshishido
Contributor

juanshishido commented Aug 21, 2017

Context

This addresses issues such as this one, where SDK users are being rate limited when sending large audience files.

The problem with the current implementation

The TONUpload's perform method initiates a chunked upload when the input file is greater than or equal to _MIN_FILE_SIZE, currently set at 64 MB (1024 * 1024 * 64). The problem is that it sends x-ton-min-chunk-size-sized (1 MB) chunks. This means a 63 MB file, because it follows the single-upload path, will only make two requests to the 1.1/ton/bucket/ta_partner endpoint—the initial POST and the subsequent PUT—while a 65 MB file will make 66 requests to the same endpoint.

Proposal

For chunked uploads, choose the chunk size based on:

  • _RESPONSE_TIME_MAX
  • x-response-time

I've defined a 5-second threshold (_RESPONSE_TIME_MAX = 5000). When the response time for the previous request is less than or equal to that value, we send a max-sized chunk (64 MB, for example). Otherwise, we scale it based on the inverse of the actual response time divided by the threshold. With the 5-second max, if the previous request took 10 seconds, the chunk will be 32 MB (assuming the 64 MB default).

juanshishido added some commits Aug 21, 2017

Dynamic chunk size
For chunked uploads, choose chunk size based on

* `_RESPONSE_TIME_MAX`
* `x-response-time`
@juanshishido

This comment has been minimized.

Contributor

juanshishido commented Aug 21, 2017

Looking for feedback.

If accepted, the plan is to implement the same logic in the Ruby and PHP SDKs.

response_time = int(response.headers['x-response-time'])
chunk_size = min_chunk_size * size(self._DEFAULT_CHUNK_SIZE,
self._RESPONSE_TIME_MAX,
response_time)

This comment has been minimized.

@jbabichjapan

jbabichjapan Aug 21, 2017

Just confirming...does this need something like 253: location = response.headers['location'] from the else branch above on the last chunk? (We had some issues related to whether or not the last chunk has the location header, maybe it's on the first chunk..)

This comment has been minimized.

@juanshishido

juanshishido Aug 22, 2017

Contributor

On the POST request to the /1.1/ton/data/ta_partner endpoint, you'll see something like:

location: /1.1/ton/data/ta_partner/2417045708/AeEFBiuxuFY2zgt.txt?resumable=true&resumeId=396213

On the last PUT request, you'll see something like:

location: /ta_partner/2417045708/AeEFBiuxuFY2zgt.txt

What we do, using the location value from line 253, is return location.split("?")[0], keeping everything before the ?: /1.1/ton/data/ta_partner/2417045708/AeEFBiuxuFY2zgt.txt.

resource = "{0}{1}".format(self._DEFAULT_RESOURCE, self.bucket)
response = self.__upload(resource, open(self._file_path, 'rb').read())
return response.headers['location']
else:
response = self.__init_chunked_upload()
chunk_size = int(response.headers['x-ton-min-chunk-size'])
min_chunk_size = int(response.headers['x-ton-min-chunk-size'])
chunk_size = min_chunk_size * self._DEFAULT_CHUNK_SIZE

This comment has been minimized.

@tushdante

tushdante Aug 25, 2017

Contributor

Are we sure that the value of x-ton-min-chunk-size won't change? Having the chunk_size be dependent on this value may cause issues if the header value is updated/corrected. I think we should either:

a. ensure that this header value is updated to reflect the actual minimum chunk size
b. set the chunk_size value as a const

cc @jbabichjapan what do you think?

This comment has been minimized.

@juanshishido

juanshishido Aug 26, 2017

Contributor

Thanks, @tushdante!

x-ton-min-chunk-size is only returned on the POST request to the /1.1/ton/bucket/ta_partner endpoint and is always 1 MiB (1,048,576 bytes). We only see that value once during the upload process. Does that change your thought on this?

This comment has been minimized.

@jbabichjapan

jbabichjapan Aug 26, 2017

The code should in best scenario have elegant handling for network errors on the first attempt to upload or subsequent chunks, but otherwise think it should be okay to iterate after this check in

@juanshishido juanshishido changed the title [WIP] Dynamic chunk size Dynamic chunk size Aug 28, 2017

@juanshishido juanshishido merged commit ee0e3bc into twitterdev:master Aug 28, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@juanshishido juanshishido deleted the juanshishido:fix-chunked-ta branch Aug 28, 2017

@juanshishido juanshishido referenced this pull request Aug 28, 2017

Merged

bump to 1.2.3 #123

@juanshishido juanshishido referenced this pull request Nov 9, 2017

Merged

Fix chunk size for multiple ton upload #161

1 of 3 tasks complete

juanshishido added a commit to twitterdev/twitter-ruby-ads-sdk that referenced this pull request Nov 10, 2017

Fix chunk size for multiple ton upload (#161)
* fix: reduce TONUpload RateLimit risk.

closes #152

* test: add ton_upload spec.

* refactor: follow implementation twitterdev/twitter-python-ads-sdk#122
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment