Skip to content

Conversation

@casperdcl
Copy link
Contributor

@casperdcl casperdcl commented Mar 2, 2020

Looks like disallowing URLs containing queries happened in b83564d (#2707), i.e. dvc>0.66.1... Not sure why. Seems fine to me to remove the assert not p.query restriction.

@casperdcl casperdcl added the bug Did we break something? label Mar 2, 2020
@casperdcl casperdcl requested a review from Suor March 2, 2020 18:32
@casperdcl casperdcl self-assigned this Mar 2, 2020
@Suor
Copy link
Contributor

Suor commented Mar 5, 2020

This basically breaks URLInfo. Many of its methods will stop to work:
url, __str__, __eq__, __hash__, ... So info will be lost and they will start to behave weird when comparing or when using as dict keys etc.

You should either add full support of params or go around of it. If you are going to implement it then you should leave the old class for anything but http urls for efficiency.

@casperdcl
Copy link
Contributor Author

ok so this is still a bug then. We need to support queries because some sites need them for downloadable links (I seem to recall dropbox for example returns a html webpage if you leave out ?dl=1 - not sure if this is still the case but you get the point)

@casperdcl casperdcl added the enhancement Enhances DVC label Mar 12, 2020
Comment on lines +283 to +286
obj.fill_parts(scheme, host, user, port, path)
obj.params = params
obj.query = query
obj.fragment = fragment
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't bother overriding fill_parts as it's not used publicly anywhere else

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then don't. It used via .replace(), .__div__(), .parent, If the inherited implementation works then you may skip though.

Copy link
Contributor Author

@casperdcl casperdcl Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those use from_parts (which would use this overridden method). The point is nothing else explicitly uses fill_parts directly so there's no need to override fill_parts for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You keep saying don't need to override, but you override. Not sure I understand what you are up to here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean technically the correct way to do things would be:

def fill_parts(self, scheme, host, user, port, path, params, query, fragment):
    super().fill_parts(self, scheme, host, user, port, path)
    self.params = params
    self.query = query
    self.fragment = fragment

but this isn't required right now as fill_parts is really a private method not used anywhere else.

@casperdcl
Copy link
Contributor Author

@shcheklein / @jorgeorpinel / @Suor ready for review

@casperdcl casperdcl added the p1-important Important, aka current backlog of things to do label Mar 12, 2020
@jorgeorpinel jorgeorpinel removed their request for review March 13, 2020 23:38
@casperdcl casperdcl requested a review from efiop March 14, 2020 16:20
@efiop efiop merged commit f19d800 into treeverse:master Mar 17, 2020
Copy link
Contributor

@Suor Suor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some simplifications are possible. Sorry for late review.

Comment on lines +283 to +286
obj.fill_parts(scheme, host, user, port, path)
obj.params = params
obj.query = query
obj.fragment = fragment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then don't. It used via .replace(), .__div__(), .parent, If the inherited implementation works then you may skip though.

def __init__(self, url):
p = urlparse(url)
stripped = p._replace(params=None, query=None, fragment=None)
super().__init__(stripped.geturl())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use restringification? May use .from_parts() or .fill_parts().

Copy link
Contributor Author

@casperdcl casperdcl Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More future-proof to use super() to use the parent's logic. Otherwise we'd need

p = urlparse(url)
-stripped = p._replace(params=None, query=None, fragment=None)
-super().__init__(stripped.geturl())
+assert p.password is None
+self.fill_parts(p.scheme, p.hostname, p.username, p.port, p.path)

and not call super(), which is allowed but not great practice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, we can ignore all of this as long as it works. Some tech debt, but might be resolved later.

Needing to parse, restringify and parse is a kludge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Did we break something? enhancement Enhances DVC p1-important Important, aka current backlog of things to do

Projects

None yet

Development

Successfully merging this pull request may close these issues.

import-url: does not work for URL that contains a query string

3 participants