Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to HEAD requests when the Companion looks to get meta information about a URL #3417

Merged
merged 3 commits into from
Jan 24, 2022

Conversation

zackbloom
Copy link
Contributor

At present, the URL loading component of the Companion makes two GET requests to the target URL. The first is only used to gather meta information (the files Content-Length and Content-Type), and then is aborted. The second downloads the file. Of course, depending on how much of the body has already been sent, and how well the delivering server understands aborts, this may result in a significant amount of unnecessary body data also being sent in that first meta request.

HTTP has a way to gather meta information about a file, the HEAD HTTP method. It is specified that a response from HEAD should include the Content-Length that will be returned from a GET request, making this a suitable option assuming the server in question implements HEAD correctly.

This change optimistically attempts a HEAD request. If it fails, or it doesn't return a meaningful Content-Length, it fallsback to the GET request currently used.

Compatibility

I believe this should not change the behavior of the module significantly unless:

  • The URL being fetched implements HEAD in a truly standards-incompliant way which returns a value, but not the right value
  • The HEAD request takes a very long time to return it's non-result, slowing down the overall download

Value

Conversely, I believe that this will slightly improve performance, and potentially significantly decrease server overhead, for URLs which implement standards correctly. Given that I imagine most URLs point to well-behaved instances of popular web servers, I think that probably amounts to 99%+ of the requests being made. Given we still will always fallback to a GET request on failure, I think the percentage of URLs this will cause issues with is quite small.

Note

I also included a tiny additional change, adding CHUNK_SIZE as an option which can be specified using env vars. Apologies if it can't be merged in this PR.

@Murderlon Murderlon requested a review from mifi January 10, 2022 09:10
…a information, falling back to a GET request on failure. Add Tus chunk size env var option.
Keep original signature and rewrite code a bit
@mifi
Copy link
Contributor

mifi commented Jan 12, 2022

Thanks for your PR! I agree it make sense to try HEAD first and then retry with GET. I took the liberty of updating doc and rewriting the logic a bit so it reads more easily (imo). Maybe @Murderlon wants to do a review. I did some testing and it still works with S3 signed URLs

@mifi
Copy link
Contributor

mifi commented Jan 12, 2022

The main disadvantage I see is that it will slightly slow down requests to signed S3 urls, as it has to do a HEAD first and then GET, and a lot of companies use S3 to store files.

@zackbloom
Copy link
Contributor Author

zackbloom commented Jan 14, 2022

I think you're rewrite is fantastic @mifi! Phenomenal job making the code more clear.

A simple partial solution to the issue you describe is to look if the hostname ends with (^|\.)s3(.|\-)([a-z\-]+\.)?amazonaws.com. That should match:

s3.amazonaws.com
s3-us-west-2.amazonaws.com
s3.us-west-2.amazonaws.com

@mifi
Copy link
Contributor

mifi commented Jan 14, 2022

Interesting idea! Do you have any source for this? If we implement such a check I would prefer if we can link to an official source showing the list of possible hostnames for S3

@mifi
Copy link
Contributor

mifi commented Jan 24, 2022

I will merge this, thanks! We can optimize in the future if needed, but I don't think it should be a big problem for now.

@mifi mifi merged commit 72c5515 into transloadit:main Jan 24, 2022
Murderlon added a commit that referenced this pull request Jan 31, 2022
* main:
  Upgrade size-limit to 7.0.5 (#3445)
  Unsplash: UI improvements (#3438)
  @uppy/thumbnail-generator: exifr: remove legacy IE support (#3382)
  Default to HEAD requests when the Companion looks to get meta information about a URL (#3417)
  check if info array is empty (#3442)
Murderlon added a commit that referenced this pull request Feb 1, 2022
* main:
  Upgrade size-limit to 7.0.5 (#3445)
  Unsplash: UI improvements (#3438)
  @uppy/thumbnail-generator: exifr: remove legacy IE support (#3382)
  Default to HEAD requests when the Companion looks to get meta information about a URL (#3417)
  check if info array is empty (#3442)
  dev: fix Vite custom plugin (#3437)
  website: add legacy bundle to CDN example (#3433)
  meta: remove unused lerna and npm files (#3436)
  meta: replace browserify with esbuild (#3363)
  Release: uppy@2.4.1 (#3432)
  @uppy/transloadit: fix handling of Tus errors and rate limiting (#3429)
  Add Unsplash to website dashboard example (#3431)
@github-actions github-actions bot mentioned this pull request Feb 14, 2022
@aduh95 aduh95 mentioned this pull request Feb 14, 2022
github-actions bot pushed a commit that referenced this pull request Feb 14, 2022
| Package                   | Version | Package                   | Version |
| ------------------------- | ------- | ------------------------- | ------- |
| @uppy/companion           |   3.2.0 | @uppy/provider-views      |   2.0.7 |
| @uppy/companion-client    |   2.0.5 | @uppy/thumbnail-generator |   2.1.0 |
| @uppy/core                |   2.1.5 | @uppy/robodog             |   2.3.0 |
| @uppy/dashboard           |   2.1.4 | uppy                      |   2.5.0 |
| @uppy/locales             |   2.0.6 |                           |         |

- @uppy/companion: add support for COMPANION_UNSPLASH_SECRET (Mikael Finstad / #3463)
- @uppy/unsplash: fix nested meta (Artur Paikin / #3485)
- meta: fix(docs): typo in property `thumbnailType` (Dan Schalow / #3472)
- @uppy/robodog: add audio, box, unsplash, screen-capture to Robodog (Artur Paikin / #3483)
- meta: consolidate ENV files and fix contributing guidelines (Antoine du Hamel / #3475)
- @uppy/companion-client,@uppy/companion,@uppy/provider-views,@uppy/robodog: Finishing touches on Companion dynamic Oauth (Renée Kooi / #2802)
- meta: Improve companion docs (Mikael Finstad / #3479)
- meta: Make E2E Great Again (Merlijn Vos / #3444)
- meta: Add PostCSS handling to Vite (Artur Paikin / #3467)
- meta: Update CONTRIBUTING.md (Mikael Finstad / #3411)
- @uppy/companion: fix broken thumbnails for box and dropbox (Mikael Finstad / #3460)
- website: fix `Uppy is not defined` error (Antoine du Hamel / #3461)
- @uppy/companion: Implement periodic ping functionality (Mikael Finstad / #3246)
- @uppy/companion: fix callback urls (Mikael Finstad / #3458)
- @uppy/core,@uppy/dashboard,@uppy/thumbnail-generator: Add dashboard and UIPlugin types (Merlijn Vos / #3426)
- @uppy/locales: Add "save" to fr_FR.js (Charly Billaud / #3395)
- @uppy/companion: Fix TypeError when invalid initialization vector (Julian Gruber / #3416)
- meta: Upgrade size-limit to 7.0.5 (Artur Paikin / #3445)
- @uppy/provider-views: Unsplash: UI improvements (Artur Paikin / #3438)
- @uppy/thumbnail-generator: exifr: remove legacy IE support (Artur Paikin / #3382)
- @uppy/companion: Default to HEAD requests when the Companion looks to get meta information about a URL (Zack Bloom / #3417)
- @uppy/dashboard: check if info array is empty (Artur Paikin / #3442)
- meta: dev: fix Vite custom plugin (Antoine du Hamel / #3437)
- website: add legacy bundle to CDN example (Antoine du Hamel / #3433)
- meta: remove unused lerna and npm files (Antoine du Hamel / #3436)
- meta: replace browserify with esbuild (Antoine du Hamel / #3363)
HeavenFox pushed a commit to docsend/uppy that referenced this pull request Jun 27, 2023
…tion about a URL (transloadit#3417)

* Update the url companion to default to a HEAD request to get file meta information, falling back to a GET request on failure. Add Tus chunk size env var option.

* refactor

Keep original signature and rewrite code a bit

* add docs for COMPANION_CHUNK_SIZE

Co-authored-by: Mikael Finstad <finstaden@gmail.com>
HeavenFox pushed a commit to docsend/uppy that referenced this pull request Jun 27, 2023
| Package                   | Version | Package                   | Version |
| ------------------------- | ------- | ------------------------- | ------- |
| @uppy/companion           |   3.2.0 | @uppy/provider-views      |   2.0.7 |
| @uppy/companion-client    |   2.0.5 | @uppy/thumbnail-generator |   2.1.0 |
| @uppy/core                |   2.1.5 | @uppy/robodog             |   2.3.0 |
| @uppy/dashboard           |   2.1.4 | uppy                      |   2.5.0 |
| @uppy/locales             |   2.0.6 |                           |         |

- @uppy/companion: add support for COMPANION_UNSPLASH_SECRET (Mikael Finstad / transloadit#3463)
- @uppy/unsplash: fix nested meta (Artur Paikin / transloadit#3485)
- meta: fix(docs): typo in property `thumbnailType` (Dan Schalow / transloadit#3472)
- @uppy/robodog: add audio, box, unsplash, screen-capture to Robodog (Artur Paikin / transloadit#3483)
- meta: consolidate ENV files and fix contributing guidelines (Antoine du Hamel / transloadit#3475)
- @uppy/companion-client,@uppy/companion,@uppy/provider-views,@uppy/robodog: Finishing touches on Companion dynamic Oauth (Renée Kooi / transloadit#2802)
- meta: Improve companion docs (Mikael Finstad / transloadit#3479)
- meta: Make E2E Great Again (Merlijn Vos / transloadit#3444)
- meta: Add PostCSS handling to Vite (Artur Paikin / transloadit#3467)
- meta: Update CONTRIBUTING.md (Mikael Finstad / transloadit#3411)
- @uppy/companion: fix broken thumbnails for box and dropbox (Mikael Finstad / transloadit#3460)
- website: fix `Uppy is not defined` error (Antoine du Hamel / transloadit#3461)
- @uppy/companion: Implement periodic ping functionality (Mikael Finstad / transloadit#3246)
- @uppy/companion: fix callback urls (Mikael Finstad / transloadit#3458)
- @uppy/core,@uppy/dashboard,@uppy/thumbnail-generator: Add dashboard and UIPlugin types (Merlijn Vos / transloadit#3426)
- @uppy/locales: Add "save" to fr_FR.js (Charly Billaud / transloadit#3395)
- @uppy/companion: Fix TypeError when invalid initialization vector (Julian Gruber / transloadit#3416)
- meta: Upgrade size-limit to 7.0.5 (Artur Paikin / transloadit#3445)
- @uppy/provider-views: Unsplash: UI improvements (Artur Paikin / transloadit#3438)
- @uppy/thumbnail-generator: exifr: remove legacy IE support (Artur Paikin / transloadit#3382)
- @uppy/companion: Default to HEAD requests when the Companion looks to get meta information about a URL (Zack Bloom / transloadit#3417)
- @uppy/dashboard: check if info array is empty (Artur Paikin / transloadit#3442)
- meta: dev: fix Vite custom plugin (Antoine du Hamel / transloadit#3437)
- website: add legacy bundle to CDN example (Antoine du Hamel / transloadit#3433)
- meta: remove unused lerna and npm files (Antoine du Hamel / transloadit#3436)
- meta: replace browserify with esbuild (Antoine du Hamel / transloadit#3363)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants