Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os.path.join() used with URLs in Updater #1077

Closed
jku opened this issue Jul 13, 2020 · 3 comments
Closed

os.path.join() used with URLs in Updater #1077

jku opened this issue Jul 13, 2020 · 3 comments
Labels
client Related to the client (updater) implementation microsoft-windows

Comments

@jku
Copy link
Member

jku commented Jul 13, 2020

mirrors.py does this:

base = os.path.join(mirror_info['url_prefix'], mirror_info['metadata_path'])

This seems like an incorrect use of os.path.join(). I'm not sure what it does on Windows but there's probably room for bugs here. urllib.parse.join() is probably the safe choice (although it has some annoying corner cases that makes it different from os.path.join()) ?

There is a related issue here: I'm trying to figure out how to handle the case where metadata and targets are hosted on different hosts and the use of os.path.join() affects it. The discussion is in https://python.zulipchat.com/#narrow/stream/223926-pep458-implementation/topic/separate.20hosting.20for.20metadata.20and.20targets . Fixing this probably makes sense after we know the resolution of that discussion.

@joshuagl joshuagl added the good first issue Bite-sized items for first time contributors label Aug 10, 2020
@jku
Copy link
Member Author

jku commented Sep 23, 2020

I think we actually have multiple os separator bugs in updater that end up somewhat compensating for each other...

  • get_list_of_mirrors() uses os.path.join() to build a URL which is wrong on Windows, but tries to fix that by running url.replace('\\', '/') in the end
  • Updater._get_target_file() and Updater._update_metadata() use os.path.join() to create the final target path (which should be a relative URL/posix path). This probably creates a broken path on Windows with Windows separators -- but this path is later fed to get_list_of_mirrors() which 'fixes' this by doing the mentioned replace
  • updated_targets() and download_targets() use the target file path (a relative URL/posix path) to build an OS path to save the file in... I wonder what does on windows? Edit: this actually probably works: python seems to swap the separators correctly
  • download_targets() also strips os.sep from the target file path (which again should be a relative posix path)

I'm pretty sure it will be impossible to fix all of these in a way that wouldn't break some client mirror configuration where Windows separators were used in an innovative way... but probably it's still worth doing: If config like that exists it is just waiting to be broken

I no longer agree with "good first issue" label :) (well, some items on the list may be easy, some are tricky)

@jku jku changed the title mirrors.py uses os.path.join() with URLs os.path.join() used with URLs in Updater Sep 23, 2020
@joshuagl joshuagl removed the good first issue Bite-sized items for first time contributors label Sep 23, 2020
@sechkova
Copy link
Contributor

sechkova commented Sep 24, 2020

I think the reference implementation should adopt the same strategy when dealing with URLs/paths on the client side as well as on the repository side. And while the specification at least recommends the format of paths in the metadata:

To avoid surprising behavior when matching targets with PATHPATTERN, it is RECOMMENDED that PATHPATTERN uses the forward slash (/) as directory separator and does not start with a directory separator, akin to TARGETSPATH.

mirrors.py is a fully implementation product ...

So as I see @lukpueh already suggested in #1018, lets align these two ideas with both client and repository:

Maybe we can restrict updater to work only with path-relative-scheme-less-URL strings and add a convenience interface to form the full path URL and/or solve Windows-specific corner cases in the config?

@jku jku added the client Related to the client (updater) implementation label Nov 27, 2020
@jku
Copy link
Member Author

jku commented Nov 30, 2021

I'm going to close this:

  • legacy client does use os.path.join on things that form URLs but then does other processing to the URL deep in the download code
  • ngclient does not use os.path.join on URL parts

@jku jku closed this as completed Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client Related to the client (updater) implementation microsoft-windows
Projects
None yet
Development

No branches or pull requests

3 participants