-
Notifications
You must be signed in to change notification settings - Fork 3
Support URL encoded data #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @jgberry thanks for taking the time to make these changes! I think you have spotted a couple issues with the docs/code here. As far as the documentation goes it is intentional that we don't automatically quote/unquote data when handling data URL's, that portion of the documentation was intended to imply that the data must be pre-quoted when creating a URL. I would prefer not to update this at the moment because it would break backwards compatibility. It's definitely an issue that creating a URL with unquoted illegal characters works, this should be updated so that an error is returned in this case and new test cases added for that. I have opened issue #17 to address this. I do see the need for this feature though, it could be useful for users to pass unquoted data that will be transformed. I'm open to adding a new constructor function for that. |
|
Thanks for taking a look at this @telday!
That makes sense. Could the docs could be adjusted to reflect this more clearly? Currently the docs say "Otherwise [if the base64_encoded flag is set to False] the data is passed through as is". This doesn't explicitly capture that the data MUST be urlencoded if I'd also like to highlight how these issues currently complicate parsing for users. I'm currently working around this using logic similar to this: url = data_url.DataURL.from_url(url_str)
if url.is_base64_encoded:
data = url.data
else:
data = unquote(url.data)Perhaps an additional property could be added to parallel @property
def decoded_data(self):
if url.is_base64_encoded:
return url.data
else:
return unquote(url.data)This allows users to fetch the data without concerning themselves how the data is represented. |
|
@telday I've made changes based on your feedback. Let me know what you think! |
|
@jgberry thanks for the quick turnaround. I still don't think this is quite right. When I mentioned a new constructor I meant adding a new external function like construct_data_url. I'm not sure why we would update the I also think that by updating the This way allows us to maintain the integrity of the data throughout but gives new options for creation and use of quoted data that is agnostic to whether or not the data is actually quoted for retrieval. e.g. You create two urls, one quoted and one not, but you can access the data either way: url1 = construct_quoted_data_url('application/json', false, '{"some": "json"}')
url2 = construct_data_url('text/plain', false, "data")
...
# Then we can recover the raw data either way with the unquoted property:
print(url1.unquoted_data) # {"some": "json"}
print(url2.unquoted_data) # dataWe will still run into issues if the user has unintentional quote characters in their string but at least we are explicitly telling them that they are going to get unquoted data from the property. |
I see. I missed that external function, it would make sense to move this behavior there.
I'm not sure this makes sense. If the data is base64 encoded, I certainly do not want to unquote it nor could I unquote it; it's a bytes object. What I effectively want with this property is to expose the underlying data stored in the data URL. If the data in the URL is base64 encoded, then the underlying data is the base64 decoded byte string. If the data is not base64 encoded, then the underlying data is the unquoted string. If I understand your objection correctly, you're opposed to this property returning a union of
It is this "data" that the Mozilla documentation refers to that I am trying to expose via the
Per the spec, non-base64 encoded data MUST be quoted, so I'm not sure why we must be agnostic. Is the goal to have this package handle cases outside of the spec? All in all, it seems this feature needs a bit more thought and the issues it aims to solve are easy enough to work around. I'll close this PR for now. |
As per the specification linked in the README.md:
This is presently not adhered to, which can cause the generation of invalid data URLs. Consider the below example:
Likewise, parsing of URLs with escape characters results in
DataURL.datacontaining encoded data:This PR address both issues by unquoting non-base64 data when parsing URLs and by quoting non-base64 data when constructing URLs.
Caveats: If users are presently URL encoding their data prior to passing it to
DataURL.from_data, this would break their URLs (the URL would be double percent encoded).