Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert returned binary data from call() to string #21656

Merged
merged 1 commit into from Feb 10, 2020

Conversation

ziransun
Copy link
Member

@ziransun ziransun commented Feb 7, 2020

This changes is based on the same cause as
commit 6ea4b6f

subprocess.check_output() returns binary data as output. In python2,
binary is basically an alias of str so it can be directly used as a
normal string. However in python3 the binary data is different to a
string. Inserting binary data into a string in python3 generates
strings like '/some/path/with/b'binary'/data/inserted' which are not
correct file paths. We must decode() the returned data in order to use
it as a string.

Copy link
Member

@Hexcles Hexcles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks non-ASCII output on Python2, and also introduces inconsistent behaviours between 2 and 3.

decode() uses the default decoding in Python 2, which is ASCII; but in Python 3, the encoding parameter defaults to utf-8. I think it's better to explicitly set encoding="utf-8" so that it behaves the same on 2 and 3 and does not break non-ASCII output.

@@ -289,7 +289,7 @@ def find_webdriver(self, channel=None):
return find_executable("geckodriver")

def get_version_and_channel(self, binary):
version_string = call(binary, "--version").strip()
version_string = call(binary, "--version").decode('utf8').strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm nervous about having to do this at every callsite, vs just making call always return text. That's a bigger change, but is there any reason to prefer this approach?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't have any preference. I guess I was a bit over-cautious.
Just updated the code. Please review. Thank you!

This changes is based on the same cause as
commit 6ea4b6f

subprocess.check_output() returns binary data as output. In python2,
binary is basically an alias of str so it can be directly used as a
normal string. However in python3 the binary data is different to a
string. Inserting binary data into a string in python3 generates
strings like '/some/path/with/b'binary'/data/inserted' which are not
correct file paths. We must decode() the returned data in order to use
it as a string.

This change also explicitly sets encoding="utf-8". In Python2,
default decoding is ASCII while in Python 3 it is "utf-8". Leaving
encoding parameters as defaults will break non-ASCII output on Python 2.
Copy link
Member

@Hexcles Hexcles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we only deal with UTF-8 output (which is probably a safe assumption), this LGTM

@jgraham
Copy link
Contributor

jgraham commented Feb 10, 2020

I think embedding that assumption in a single place is better than putting it in lots of places…

@jgraham jgraham merged commit 9ce6f13 into web-platform-tests:master Feb 10, 2020
@ziransun ziransun deleted the binary branch May 11, 2020 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants