Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

increase size of index, line, and column fields #310

Merged
merged 8 commits into from Dec 13, 2019

Conversation

@dwightguth
Copy link
Contributor

@dwightguth dwightguth commented Jun 24, 2019

I got the following exception from pyyaml when I tried to parse a yaml file that was greater than 2**31 characters long:

File "ext/_yaml.pyx", line 781, in _yaml.CParser._compose_scalar_node (ext/_yaml.c:11984)
File "ext/_yaml.pyx", line 72, in _yaml.Mark.__init__ (ext/_yaml.c:1814)
OverflowError: value too large to convert to int

This appears to be related to the use of cython. This PR should fix that issue, at least up to files of length 2**64.

@nitzmahone nitzmahone self-assigned this Aug 21, 2019
Copy link
Member

@nitzmahone nitzmahone left a comment

At a glance, this is incorrect for 32-bit architectures (actually I'm curious how the existing thing isn't breaking under 64-bit, unless x64 struct alignment is saving our bacon). Should probably be Py_ssize_t to match the C header on any architecture.

@dwightguth
Copy link
Contributor Author

@dwightguth dwightguth commented Aug 21, 2019

@nitzmahone I'm assuming you actually meant an unsigned type since it seems that yaml_mark_t uses size_t and not ssize_t. This should correct it so it has the same signature as yaml_mark_t though.

@nitzmahone
Copy link
Member

@nitzmahone nitzmahone commented Aug 23, 2019

Sorry, yeah- that's what I get for looking it over and commenting on my way out the door. ;)

Looking over the other stuff in there, I see several other uses of int in cdefs that also seem like they would be problematic for large files, so I'm wondering how you're not running into those... I'm a little nervous to just accept this change without a test to actually explore the limits of (at least) 32-bit architectures, and to ensure that it works well past those limits on 64-bit.

@dwightguth dwightguth force-pushed the mark branch 2 times, most recently from 31a5a2c to fd193d5 Aug 23, 2019
@dwightguth
Copy link
Contributor Author

@dwightguth dwightguth commented Aug 23, 2019

@nitzmahone this is ready for review again. I added a test that tests the 64-bit case. The test fails without the change I made and passes with it. However, currently the test is disabled on 32 bits because there seems to be some other problem with parsing when the size of the file exceeds the size of an ssize_t. Since I'm not interested in 32-bit, I'd rather we at least get 64 bit working, even if files of size 2**31 < size < 2**32 are still broken on 32 bits (at least based on what I saw on AppVeyor).

@dwightguth
Copy link
Contributor Author

@dwightguth dwightguth commented Aug 23, 2019

Note that this test requires at least 2gb of free hard disk space, though... you can decide whether you're happy with that or not, but it seemed better to me than requiring 2gb of RAM... And I'm pretty sure we can't avoid at least one of those two things.

@dwightguth
Copy link
Contributor Author

@dwightguth dwightguth commented Sep 3, 2019

@nitzmahone your thoughts on this?

@nitzmahone
Copy link
Member

@nitzmahone nitzmahone commented Sep 4, 2019

I'm a little torn on the disk vs memory requirement, but this is fine for now... One other minor thing: can you add unittest.skip() with an appropriate description on 32-bit platforms rather than just implicitly passing? That way maybe we can get back to a 32-bit appropriate test at some point. Thanks for running with this!

@dwightguth
Copy link
Contributor Author

@dwightguth dwightguth commented Sep 4, 2019

@nitzmahone It looks like you're not actually using the unittest module. I would probably have to add custom logic to test_appliance.py in order to get it to have skip functionality that would work for this test. Do you still want me to?

@nitzmahone
Copy link
Member

@nitzmahone nitzmahone commented Sep 4, 2019

Ah crap, I forgot this one is ancient and "custom" (cutting this project over to pytest is on my "one of these years" list)... It's fine as-is then. I've added this PR to the 5.2 release project, so it won't get merged until we start prepping that release branch, but it's good to go and should be included in the 5.2 release. Thanks!

@dwightguth
Copy link
Contributor Author

@dwightguth dwightguth commented Sep 4, 2019

Thank you!

@abergeron
Copy link

@abergeron abergeron commented Sep 26, 2019

I've also encountered the same problem. I also came up with the same solution (more or less). And made a PR before finding this one.

Is there any estimate as to when PyYAML 5.2 will be released?

@perlpunk
Copy link
Member

@perlpunk perlpunk commented Dec 9, 2019

Would it be possible to skip this test by default and enable it via an environment variable?
It takes a long time to run the tests now, and my system freezes for a moment.

@dwightguth
Copy link
Contributor Author

@dwightguth dwightguth commented Dec 10, 2019

@perlpunk I added the if case for the environment variable like you suggested.

Copy link
Member

@nitzmahone nitzmahone left a comment

Also needs to have the var set in an env block in travis.yml and an environment block in appveyor.yml so we're sure the test is still working before getting merged. Otherwise LGTM- in the future we might want to add a warning when the test is skipped and how to enable it, but no need to gild the lily right now. ;) Thanks!

tests/lib/test_yaml_ext.py Outdated Show resolved Hide resolved
tests/lib3/test_yaml_ext.py Show resolved Hide resolved
@dwightguth
Copy link
Contributor Author

@dwightguth dwightguth commented Dec 11, 2019

I fixed the two inline comments you made, but I'm having trouble figuring out the correct fix for the travis.yml and I'm assuming since I'm an external collaborator that it won't pick up my changes to that file anyway until after it gets merged. So I thought I'd ask for assistance rather than simply guessing at the correct change and hoping that if it's wrong, someone will catch it in code review before it breaks your CI.

Basically, I'm a little unsure how to add an environment variable to every build in the matrix. I found the documentation for travis online but it seems to all be describing a completely different structure than I see in your travis.yml, so I'm not really sure what the right change is to make. Can anyone help me figure out what is going on? or maybe point me to documentation that will help me understand why the examples I see in the documentation don't match the structure of the file I'm trying to modify?

@nitzmahone
Copy link
Member

@nitzmahone nitzmahone commented Dec 11, 2019

That's fine- I can add a commit to this PR to do that, as well as generifying the envvar name. Thanks for getting it in!

@nitzmahone nitzmahone changed the base branch from master to release/5.3 Dec 11, 2019
@perlpunk perlpunk merged commit 25a00c3 into yaml:release/5.3 Dec 13, 2019
2 checks passed
@perlpunk
Copy link
Member

@perlpunk perlpunk commented Dec 13, 2019

Thank you both, merged!

@perlpunk perlpunk moved this from Backlog to Merged to release/5.3 in 5.3 Release Dec 13, 2019
@perlpunk
Copy link
Member

@perlpunk perlpunk commented Dec 14, 2019

I also added the following line to tox.ini:
passenv = PYYAML_TEST_GROUP
I was wondering why the tests didn't seem to be a bit slower after this PR :)

perlpunk added a commit that referenced this issue Dec 20, 2019
* increase size of index, line, and column fields

* use size_t instead of unsigned long long

* better test infrastructure for test for large file

* only run large file test when env var is set

* fix review comments regarding env vars

* fix missing import on python 3

* force all tests in CI
@perlpunk
Copy link
Member

@perlpunk perlpunk commented Jan 6, 2020

asherf added a commit to asherf/pants that referenced this issue Apr 28, 2020
https://github.com/yaml/pyyaml/blob/d0d660d035905d9c49fc0f8dafb579d2cc68c0c8/CHANGES#L7

5.3.1 (2020-03-18)

* yaml/pyyaml#386 -- Prevents arbitrary code execution during python/object/new constructor

5.3 (2020-01-06)

* yaml/pyyaml#290 -- Use `is` instead of equality for comparing with `None`
* yaml/pyyaml#270 -- fix typos and stylistic nit
* yaml/pyyaml#309 -- Fix up small typo
* yaml/pyyaml#161 -- Fix handling of __slots__
* yaml/pyyaml#358 -- Allow calling add_multi_constructor with None
* yaml/pyyaml#285 -- Add use of safe_load() function in README
* yaml/pyyaml#351 -- Fix reader for Unicode code points over 0xFFFF
* yaml/pyyaml#360 -- Enable certain unicode tests when maxunicode not > 0xffff
* yaml/pyyaml#359 -- Use full_load in yaml-highlight example
* yaml/pyyaml#244 -- Document that PyYAML is implemented with Cython
* yaml/pyyaml#329 -- Fix for Python 3.10
* yaml/pyyaml#310 -- increase size of index, line, and column fields
* yaml/pyyaml#260 -- remove some unused imports
* yaml/pyyaml#163 -- Create timezone-aware datetimes when parsed as such
* yaml/pyyaml#363 -- Add tests for timezone

5.2 (2019-12-02)
------------------

* Repair incompatibilities introduced with 5.1. The default Loader was changed,
  but several methods like add_constructor still used the old default
  yaml/pyyaml#279 -- A more flexible fix for custom tag constructors
  yaml/pyyaml#287 -- Change default loader for yaml.add_constructor
  yaml/pyyaml#305 -- Change default loader for add_implicit_resolver, add_path_resolver
* Make FullLoader safer by removing python/object/apply from the default FullLoader
  yaml/pyyaml#347 -- Move constructor for object/apply to UnsafeConstructor
* Fix bug introduced in 5.1 where quoting went wrong on systems with sys.maxunicode <= 0xffff
  yaml/pyyaml#276 -- Fix logic for quoting special characters
* Other PRs:
  yaml/pyyaml#280 -- Update CHANGES for 5.1
asherf added a commit to asherf/pants that referenced this issue Apr 29, 2020
https://github.com/yaml/pyyaml/blob/d0d660d035905d9c49fc0f8dafb579d2cc68c0c8/CHANGES#L7

5.3.1 (2020-03-18)

* yaml/pyyaml#386 -- Prevents arbitrary code execution during python/object/new constructor

5.3 (2020-01-06)

* yaml/pyyaml#290 -- Use `is` instead of equality for comparing with `None`
* yaml/pyyaml#270 -- fix typos and stylistic nit
* yaml/pyyaml#309 -- Fix up small typo
* yaml/pyyaml#161 -- Fix handling of __slots__
* yaml/pyyaml#358 -- Allow calling add_multi_constructor with None
* yaml/pyyaml#285 -- Add use of safe_load() function in README
* yaml/pyyaml#351 -- Fix reader for Unicode code points over 0xFFFF
* yaml/pyyaml#360 -- Enable certain unicode tests when maxunicode not > 0xffff
* yaml/pyyaml#359 -- Use full_load in yaml-highlight example
* yaml/pyyaml#244 -- Document that PyYAML is implemented with Cython
* yaml/pyyaml#329 -- Fix for Python 3.10
* yaml/pyyaml#310 -- increase size of index, line, and column fields
* yaml/pyyaml#260 -- remove some unused imports
* yaml/pyyaml#163 -- Create timezone-aware datetimes when parsed as such
* yaml/pyyaml#363 -- Add tests for timezone

5.2 (2019-12-02)
------------------

* Repair incompatibilities introduced with 5.1. The default Loader was changed,
  but several methods like add_constructor still used the old default
  yaml/pyyaml#279 -- A more flexible fix for custom tag constructors
  yaml/pyyaml#287 -- Change default loader for yaml.add_constructor
  yaml/pyyaml#305 -- Change default loader for add_implicit_resolver, add_path_resolver
* Make FullLoader safer by removing python/object/apply from the default FullLoader
  yaml/pyyaml#347 -- Move constructor for object/apply to UnsafeConstructor
* Fix bug introduced in 5.1 where quoting went wrong on systems with sys.maxunicode <= 0xffff
  yaml/pyyaml#276 -- Fix logic for quoting special characters
* Other PRs:
  yaml/pyyaml#280 -- Update CHANGES for 5.1
Eric-Arellano pushed a commit to pantsbuild/pants that referenced this issue May 1, 2020
https://github.com/yaml/pyyaml/blob/d0d660d035905d9c49fc0f8dafb579d2cc68c0c8/CHANGES#L7

5.3.1 (2020-03-18)

* yaml/pyyaml#386 -- Prevents arbitrary code execution during python/object/new constructor

5.3 (2020-01-06)

* yaml/pyyaml#290 -- Use `is` instead of equality for comparing with `None`
* yaml/pyyaml#270 -- fix typos and stylistic nit
* yaml/pyyaml#309 -- Fix up small typo
* yaml/pyyaml#161 -- Fix handling of __slots__
* yaml/pyyaml#358 -- Allow calling add_multi_constructor with None
* yaml/pyyaml#285 -- Add use of safe_load() function in README
* yaml/pyyaml#351 -- Fix reader for Unicode code points over 0xFFFF
* yaml/pyyaml#360 -- Enable certain unicode tests when maxunicode not > 0xffff
* yaml/pyyaml#359 -- Use full_load in yaml-highlight example
* yaml/pyyaml#244 -- Document that PyYAML is implemented with Cython
* yaml/pyyaml#329 -- Fix for Python 3.10
* yaml/pyyaml#310 -- increase size of index, line, and column fields
* yaml/pyyaml#260 -- remove some unused imports
* yaml/pyyaml#163 -- Create timezone-aware datetimes when parsed as such
* yaml/pyyaml#363 -- Add tests for timezone

5.2 (2019-12-02)
------------------

* Repair incompatibilities introduced with 5.1. The default Loader was changed,
  but several methods like add_constructor still used the old default
  yaml/pyyaml#279 -- A more flexible fix for custom tag constructors
  yaml/pyyaml#287 -- Change default loader for yaml.add_constructor
  yaml/pyyaml#305 -- Change default loader for add_implicit_resolver, add_path_resolver
* Make FullLoader safer by removing python/object/apply from the default FullLoader
  yaml/pyyaml#347 -- Move constructor for object/apply to UnsafeConstructor
* Fix bug introduced in 5.1 where quoting went wrong on systems with sys.maxunicode <= 0xffff
  yaml/pyyaml#276 -- Fix logic for quoting special characters
* Other PRs:
  yaml/pyyaml#280 -- Update CHANGES for 5.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
5.2 Release
Awaiting triage
5.3 Release
Merged to release/5.3
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants