Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support to Unicode characters over codepoint 0xffff #63

Merged
merged 5 commits into from Aug 8, 2017

Conversation

@peterkmurphy
Copy link
Contributor

@peterkmurphy peterkmurphy commented May 10, 2017

This patch is aimed at solving issue #25 . As a side effect, the testing code has been trimmed to accommodate the fixes. Why have I done that? I might as well repeat what I said on gmail:

The problem I am having is with the testing code, which will need a lot of unwrangling. There are some frankly ... "bizarre" assumptions in what won't work in parseable code, except it sometimes will be parseable by accident. For example:


def test_unicode_input_errors(unicode_filename, verbose=False):
    data = open(unicode_filename, 'rb').read().decode('utf-8')
    for input in [data.encode('latin1', 'ignore'), # <--- Look at this!
                    data.encode('utf-16-be'), data.encode('utf-16-le'),
                    codecs.BOM_UTF8+data.encode('utf-16-be'),
                    codecs.BOM_UTF16_BE+data.encode('utf-16-le'),
                    codecs.BOM_UTF16_LE+data.encode('utf-8')+'!']:
        try:
            yaml.load(input)
        except yaml.YAMLError, exc:
            if verbose:
                print exc
        else:
            raise AssertionError("expected an exception")

The idea: let's cause some bizarre combinations of byte sequences, attempt to parse it, and if it doesn't throw a YAMLError, raise an exception. Except that when one does data.encode('latin1', 'ignore') on data, one results in ten line breaks, which is happily parseable as YAML. So no exception raised, so AssertionError.

What should I do in this case - remove test_unicode_input_errors from the PyYaml testing code? Yes, the number of tests will go down, which is generally not a good thing, but if the tests are based on dodgy assumptions...

In some cases I have altered testing code; others I have removed them.

@@ -32,4 +32,3 @@ Submit bug reports and feature requests to the PyYAML bug tracker:

PyYAML is written by Kirill Simonov <xi@resolvent.net>. It is released
under the MIT license. See the file LICENSE for more details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing trailing newlines is bad form as it's diff noise. I'd undo it for all the files you've added it for, which seems to be every file you've touched. Also files should end with a trailing newline to be POSIX valid

Loading

@@ -674,7 +678,7 @@ def analyze_scalar(self, scalar):
# Check for indicators.
if index == 0:
# Leading indicators are special characters.
if ch in u'#,[]{}&*!|>\'\"%@`':
if ch in u'#,[]{}&*!|>\'\"%@`':

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trimming trailing whitespace is bad form as it's diff noise

Loading

@peterkmurphy
Copy link
Contributor Author

@peterkmurphy peterkmurphy commented May 10, 2017

Loading

@adamchainz
Copy link

@adamchainz adamchainz commented May 10, 2017

@peterkmurphy I'm not an admin on this project, I can't do anything to your PR.

Loading

@samdmarshall
Copy link

@samdmarshall samdmarshall commented May 16, 2017

you may need to ping @sigmavirus24 or another commiter for this project to get this merged and a new release made.

Loading

adamchainz added a commit to adamchainz/pyyaml that referenced this issue May 16, 2017
adamchainz added a commit to adamchainz/pyyaml that referenced this issue May 16, 2017
@adamchainz
Copy link

@adamchainz adamchainz commented May 16, 2017

I copied this and tidied it up in #65

Loading

@peterkmurphy
Copy link
Contributor Author

@peterkmurphy peterkmurphy commented May 16, 2017

Loading

@sigmavirus24 sigmavirus24 merged commit 94c3f07 into yaml:master Aug 8, 2017
1 check passed
Loading
@sigmavirus24
Copy link
Contributor

@sigmavirus24 sigmavirus24 commented Aug 8, 2017

Thanks @peterkmurphy! 🎉

Loading

@jborean93
Copy link

@jborean93 jborean93 commented Oct 23, 2018

@ingydotnet is there any chance this fix could be backported to the 3.x branch and a new release made? I can install the pre-release 4.x builds but I haven't seen any action recently that indicates a full release will be made on those changes.

cc @nitzmahone

Loading

@nitzmahone
Copy link
Member

@nitzmahone nitzmahone commented Oct 23, 2018

I'd be +1 for that

Loading

@ingydotnet
Copy link
Member

@ingydotnet ingydotnet commented Oct 24, 2018

@perlpunk and I are meeting up in a week. We might be able to discuss it then.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

7 participants