Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

increase size of index, line, and column fields #310

Merged
merged 8 commits into from
Dec 13, 2019

Conversation

dwightguth
Copy link

I got the following exception from pyyaml when I tried to parse a yaml file that was greater than 2**31 characters long:

File "ext/_yaml.pyx", line 781, in _yaml.CParser._compose_scalar_node (ext/_yaml.c:11984)
File "ext/_yaml.pyx", line 72, in _yaml.Mark.__init__ (ext/_yaml.c:1814)
OverflowError: value too large to convert to int

This appears to be related to the use of cython. This PR should fix that issue, at least up to files of length 2**64.

@nitzmahone nitzmahone self-assigned this Aug 21, 2019
Copy link
Member

@nitzmahone nitzmahone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a glance, this is incorrect for 32-bit architectures (actually I'm curious how the existing thing isn't breaking under 64-bit, unless x64 struct alignment is saving our bacon). Should probably be Py_ssize_t to match the C header on any architecture.

@dwightguth
Copy link
Author

@nitzmahone I'm assuming you actually meant an unsigned type since it seems that yaml_mark_t uses size_t and not ssize_t. This should correct it so it has the same signature as yaml_mark_t though.

@nitzmahone
Copy link
Member

Sorry, yeah- that's what I get for looking it over and commenting on my way out the door. ;)

Looking over the other stuff in there, I see several other uses of int in cdefs that also seem like they would be problematic for large files, so I'm wondering how you're not running into those... I'm a little nervous to just accept this change without a test to actually explore the limits of (at least) 32-bit architectures, and to ensure that it works well past those limits on 64-bit.

@dwightguth
Copy link
Author

dwightguth commented Aug 23, 2019

@nitzmahone this is ready for review again. I added a test that tests the 64-bit case. The test fails without the change I made and passes with it. However, currently the test is disabled on 32 bits because there seems to be some other problem with parsing when the size of the file exceeds the size of an ssize_t. Since I'm not interested in 32-bit, I'd rather we at least get 64 bit working, even if files of size 2**31 < size < 2**32 are still broken on 32 bits (at least based on what I saw on AppVeyor).

@dwightguth
Copy link
Author

Note that this test requires at least 2gb of free hard disk space, though... you can decide whether you're happy with that or not, but it seemed better to me than requiring 2gb of RAM... And I'm pretty sure we can't avoid at least one of those two things.

@dwightguth
Copy link
Author

@nitzmahone your thoughts on this?

@nitzmahone
Copy link
Member

I'm a little torn on the disk vs memory requirement, but this is fine for now... One other minor thing: can you add unittest.skip() with an appropriate description on 32-bit platforms rather than just implicitly passing? That way maybe we can get back to a 32-bit appropriate test at some point. Thanks for running with this!

@dwightguth
Copy link
Author

@nitzmahone It looks like you're not actually using the unittest module. I would probably have to add custom logic to test_appliance.py in order to get it to have skip functionality that would work for this test. Do you still want me to?

@nitzmahone
Copy link
Member

Ah crap, I forgot this one is ancient and "custom" (cutting this project over to pytest is on my "one of these years" list)... It's fine as-is then. I've added this PR to the 5.2 release project, so it won't get merged until we start prepping that release branch, but it's good to go and should be included in the 5.2 release. Thanks!

@dwightguth
Copy link
Author

Thank you!

@abergeron
Copy link

I've also encountered the same problem. I also came up with the same solution (more or less). And made a PR before finding this one.

Is there any estimate as to when PyYAML 5.2 will be released?

@perlpunk
Copy link
Member

perlpunk commented Dec 9, 2019

Would it be possible to skip this test by default and enable it via an environment variable?
It takes a long time to run the tests now, and my system freezes for a moment.

@dwightguth
Copy link
Author

@perlpunk I added the if case for the environment variable like you suggested.

Copy link
Member

@nitzmahone nitzmahone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs to have the var set in an env block in travis.yml and an environment block in appveyor.yml so we're sure the test is still working before getting merged. Otherwise LGTM- in the future we might want to add a warning when the test is skipped and how to enable it, but no need to gild the lily right now. ;) Thanks!

tests/lib/test_yaml_ext.py Outdated Show resolved Hide resolved
tests/lib3/test_yaml_ext.py Show resolved Hide resolved
@dwightguth
Copy link
Author

I fixed the two inline comments you made, but I'm having trouble figuring out the correct fix for the travis.yml and I'm assuming since I'm an external collaborator that it won't pick up my changes to that file anyway until after it gets merged. So I thought I'd ask for assistance rather than simply guessing at the correct change and hoping that if it's wrong, someone will catch it in code review before it breaks your CI.

Basically, I'm a little unsure how to add an environment variable to every build in the matrix. I found the documentation for travis online but it seems to all be describing a completely different structure than I see in your travis.yml, so I'm not really sure what the right change is to make. Can anyone help me figure out what is going on? or maybe point me to documentation that will help me understand why the examples I see in the documentation don't match the structure of the file I'm trying to modify?

@nitzmahone
Copy link
Member

That's fine- I can add a commit to this PR to do that, as well as generifying the envvar name. Thanks for getting it in!

@nitzmahone nitzmahone changed the base branch from master to release/5.3 December 11, 2019 23:09
@perlpunk perlpunk merged commit 25a00c3 into yaml:release/5.3 Dec 13, 2019
@perlpunk
Copy link
Member

Thank you both, merged!

@perlpunk
Copy link
Member

I also added the following line to tox.ini:
passenv = PYYAML_TEST_GROUP
I was wondering why the tests didn't seem to be a bit slower after this PR :)

perlpunk pushed a commit that referenced this pull request Dec 20, 2019
* increase size of index, line, and column fields

* use size_t instead of unsigned long long

* better test infrastructure for test for large file

* only run large file test when env var is set

* fix review comments regarding env vars

* fix missing import on python 3

* force all tests in CI
@perlpunk
Copy link
Member

perlpunk commented Jan 6, 2020

released https://pypi.org/project/PyYAML/5.3/

asherf added a commit to asherf/pants that referenced this pull request Apr 28, 2020
https://github.com/yaml/pyyaml/blob/d0d660d035905d9c49fc0f8dafb579d2cc68c0c8/CHANGES#L7

5.3.1 (2020-03-18)

* yaml/pyyaml#386 -- Prevents arbitrary code execution during python/object/new constructor

5.3 (2020-01-06)

* yaml/pyyaml#290 -- Use `is` instead of equality for comparing with `None`
* yaml/pyyaml#270 -- fix typos and stylistic nit
* yaml/pyyaml#309 -- Fix up small typo
* yaml/pyyaml#161 -- Fix handling of __slots__
* yaml/pyyaml#358 -- Allow calling add_multi_constructor with None
* yaml/pyyaml#285 -- Add use of safe_load() function in README
* yaml/pyyaml#351 -- Fix reader for Unicode code points over 0xFFFF
* yaml/pyyaml#360 -- Enable certain unicode tests when maxunicode not > 0xffff
* yaml/pyyaml#359 -- Use full_load in yaml-highlight example
* yaml/pyyaml#244 -- Document that PyYAML is implemented with Cython
* yaml/pyyaml#329 -- Fix for Python 3.10
* yaml/pyyaml#310 -- increase size of index, line, and column fields
* yaml/pyyaml#260 -- remove some unused imports
* yaml/pyyaml#163 -- Create timezone-aware datetimes when parsed as such
* yaml/pyyaml#363 -- Add tests for timezone

5.2 (2019-12-02)
------------------

* Repair incompatibilities introduced with 5.1. The default Loader was changed,
  but several methods like add_constructor still used the old default
  yaml/pyyaml#279 -- A more flexible fix for custom tag constructors
  yaml/pyyaml#287 -- Change default loader for yaml.add_constructor
  yaml/pyyaml#305 -- Change default loader for add_implicit_resolver, add_path_resolver
* Make FullLoader safer by removing python/object/apply from the default FullLoader
  yaml/pyyaml#347 -- Move constructor for object/apply to UnsafeConstructor
* Fix bug introduced in 5.1 where quoting went wrong on systems with sys.maxunicode <= 0xffff
  yaml/pyyaml#276 -- Fix logic for quoting special characters
* Other PRs:
  yaml/pyyaml#280 -- Update CHANGES for 5.1
asherf added a commit to asherf/pants that referenced this pull request Apr 29, 2020
https://github.com/yaml/pyyaml/blob/d0d660d035905d9c49fc0f8dafb579d2cc68c0c8/CHANGES#L7

5.3.1 (2020-03-18)

* yaml/pyyaml#386 -- Prevents arbitrary code execution during python/object/new constructor

5.3 (2020-01-06)

* yaml/pyyaml#290 -- Use `is` instead of equality for comparing with `None`
* yaml/pyyaml#270 -- fix typos and stylistic nit
* yaml/pyyaml#309 -- Fix up small typo
* yaml/pyyaml#161 -- Fix handling of __slots__
* yaml/pyyaml#358 -- Allow calling add_multi_constructor with None
* yaml/pyyaml#285 -- Add use of safe_load() function in README
* yaml/pyyaml#351 -- Fix reader for Unicode code points over 0xFFFF
* yaml/pyyaml#360 -- Enable certain unicode tests when maxunicode not > 0xffff
* yaml/pyyaml#359 -- Use full_load in yaml-highlight example
* yaml/pyyaml#244 -- Document that PyYAML is implemented with Cython
* yaml/pyyaml#329 -- Fix for Python 3.10
* yaml/pyyaml#310 -- increase size of index, line, and column fields
* yaml/pyyaml#260 -- remove some unused imports
* yaml/pyyaml#163 -- Create timezone-aware datetimes when parsed as such
* yaml/pyyaml#363 -- Add tests for timezone

5.2 (2019-12-02)
------------------

* Repair incompatibilities introduced with 5.1. The default Loader was changed,
  but several methods like add_constructor still used the old default
  yaml/pyyaml#279 -- A more flexible fix for custom tag constructors
  yaml/pyyaml#287 -- Change default loader for yaml.add_constructor
  yaml/pyyaml#305 -- Change default loader for add_implicit_resolver, add_path_resolver
* Make FullLoader safer by removing python/object/apply from the default FullLoader
  yaml/pyyaml#347 -- Move constructor for object/apply to UnsafeConstructor
* Fix bug introduced in 5.1 where quoting went wrong on systems with sys.maxunicode <= 0xffff
  yaml/pyyaml#276 -- Fix logic for quoting special characters
* Other PRs:
  yaml/pyyaml#280 -- Update CHANGES for 5.1
Eric-Arellano pushed a commit to pantsbuild/pants that referenced this pull request May 1, 2020
https://github.com/yaml/pyyaml/blob/d0d660d035905d9c49fc0f8dafb579d2cc68c0c8/CHANGES#L7

5.3.1 (2020-03-18)

* yaml/pyyaml#386 -- Prevents arbitrary code execution during python/object/new constructor

5.3 (2020-01-06)

* yaml/pyyaml#290 -- Use `is` instead of equality for comparing with `None`
* yaml/pyyaml#270 -- fix typos and stylistic nit
* yaml/pyyaml#309 -- Fix up small typo
* yaml/pyyaml#161 -- Fix handling of __slots__
* yaml/pyyaml#358 -- Allow calling add_multi_constructor with None
* yaml/pyyaml#285 -- Add use of safe_load() function in README
* yaml/pyyaml#351 -- Fix reader for Unicode code points over 0xFFFF
* yaml/pyyaml#360 -- Enable certain unicode tests when maxunicode not > 0xffff
* yaml/pyyaml#359 -- Use full_load in yaml-highlight example
* yaml/pyyaml#244 -- Document that PyYAML is implemented with Cython
* yaml/pyyaml#329 -- Fix for Python 3.10
* yaml/pyyaml#310 -- increase size of index, line, and column fields
* yaml/pyyaml#260 -- remove some unused imports
* yaml/pyyaml#163 -- Create timezone-aware datetimes when parsed as such
* yaml/pyyaml#363 -- Add tests for timezone

5.2 (2019-12-02)
------------------

* Repair incompatibilities introduced with 5.1. The default Loader was changed,
  but several methods like add_constructor still used the old default
  yaml/pyyaml#279 -- A more flexible fix for custom tag constructors
  yaml/pyyaml#287 -- Change default loader for yaml.add_constructor
  yaml/pyyaml#305 -- Change default loader for add_implicit_resolver, add_path_resolver
* Make FullLoader safer by removing python/object/apply from the default FullLoader
  yaml/pyyaml#347 -- Move constructor for object/apply to UnsafeConstructor
* Fix bug introduced in 5.1 where quoting went wrong on systems with sys.maxunicode <= 0xffff
  yaml/pyyaml#276 -- Fix logic for quoting special characters
* Other PRs:
  yaml/pyyaml#280 -- Update CHANGES for 5.1
mtremer pushed a commit to ipfire/ipfire-2.x that referenced this pull request Feb 14, 2022
- Update from 3.13 to 6.0
- Update of rootfile
- Changelog
6.0 (2021-10-13)
* yaml/pyyaml#327 -- Change README format to Markdown
* yaml/pyyaml#483 -- Add a test for YAML 1.1 types
* yaml/pyyaml#497 -- fix float resolver to ignore `.` and `._`
* yaml/pyyaml#550 -- drop Python 2.7
* yaml/pyyaml#553 -- Fix spelling of “hexadecimal”
* yaml/pyyaml#556 -- fix representation of Enum subclasses
* yaml/pyyaml#557 -- fix libyaml extension compiler warnings
* yaml/pyyaml#560 -- fix ResourceWarning on leaked file descriptors
* yaml/pyyaml#561 -- always require `Loader` arg to `yaml.load()`
* yaml/pyyaml#564 -- remove remaining direct distutils usage
5.4.1 (2021-01-20)
* yaml/pyyaml#480 -- Fix stub compat with older pyyaml versions that may unwittingly load it
5.4 (2021-01-19)
* yaml/pyyaml#407 -- Build modernization, remove distutils, fix metadata, build wheels, CI to GHA
* yaml/pyyaml#472 -- Fix for CVE-2020-14343, moves arbitrary python tags to UnsafeLoader
* yaml/pyyaml#441 -- Fix memory leak in implicit resolver setup
* yaml/pyyaml#392 -- Fix py2 copy support for timezone objects
* yaml/pyyaml#378 -- Fix compatibility with Jython
5.3.1 (2020-03-18)
* yaml/pyyaml#386 -- Prevents arbitrary code execution during python/object/new constructor
5.3 (2020-01-06)
* yaml/pyyaml#290 -- Use `is` instead of equality for comparing with `None`
* yaml/pyyaml#270 -- Fix typos and stylistic nit
* yaml/pyyaml#309 -- Fix up small typo
* yaml/pyyaml#161 -- Fix handling of __slots__
* yaml/pyyaml#358 -- Allow calling add_multi_constructor with None
* yaml/pyyaml#285 -- Add use of safe_load() function in README
* yaml/pyyaml#351 -- Fix reader for Unicode code points over 0xFFFF
* yaml/pyyaml#360 -- Enable certain unicode tests when maxunicode not > 0xffff
* yaml/pyyaml#359 -- Use full_load in yaml-highlight example
* yaml/pyyaml#244 -- Document that PyYAML is implemented with Cython
* yaml/pyyaml#329 -- Fix for Python 3.10
* yaml/pyyaml#310 -- Increase size of index, line, and column fields
* yaml/pyyaml#260 -- Remove some unused imports
* yaml/pyyaml#163 -- Create timezone-aware datetimes when parsed as such
* yaml/pyyaml#363 -- Add tests for timezone
5.2 (2019-12-02)
* Repair incompatibilities introduced with 5.1. The default Loader was changed,
  but several methods like add_constructor still used the old default
  yaml/pyyaml#279 -- A more flexible fix for custom tag constructors
  yaml/pyyaml#287 -- Change default loader for yaml.add_constructor
  yaml/pyyaml#305 -- Change default loader for add_implicit_resolver, add_path_resolver
* Make FullLoader safer by removing python/object/apply from the default FullLoader
  yaml/pyyaml#347 -- Move constructor for object/apply to UnsafeConstructor
* Fix bug introduced in 5.1 where quoting went wrong on systems with sys.maxunicode <= 0xffff
  yaml/pyyaml#276 -- Fix logic for quoting special characters
* Other PRs:
  yaml/pyyaml#280 -- Update CHANGES for 5.1
5.1.2 (2019-07-30)
* Re-release of 5.1 with regenerated Cython sources to build properly for Python 3.8b2+
5.1.1 (2019-06-05)
* Re-release of 5.1 with regenerated Cython sources to build properly for Python 3.8b1
5.1 (2019-03-13)
* yaml/pyyaml#35 -- Some modernization of the test running
* yaml/pyyaml#42 -- Install tox in a virtualenv
* yaml/pyyaml#45 -- Allow colon in a plain scalar in a flow context
* yaml/pyyaml#48 -- Fix typos
* yaml/pyyaml#55 -- Improve RepresenterError creation
* yaml/pyyaml#59 -- Resolves #57, update readme issues link
* yaml/pyyaml#60 -- Document and test Python 3.6 support
* yaml/pyyaml#61 -- Use Travis CI built in pip cache support
* yaml/pyyaml#62 -- Remove tox workaround for Travis CI
* yaml/pyyaml#63 -- Adding support to Unicode characters over codepoint 0xffff
* yaml/pyyaml#75 -- add 3.12 changelog
* yaml/pyyaml#76 -- Fallback to Pure Python if Compilation fails
* yaml/pyyaml#84 -- Drop unsupported Python 3.3
* yaml/pyyaml#102 -- Include license file in the generated wheel package
* yaml/pyyaml#105 -- Removed Python 2.6 & 3.3 support
* yaml/pyyaml#111 -- Remove commented out Psyco code
* yaml/pyyaml#129 -- Remove call to `ord` in lib3 emitter code
* yaml/pyyaml#149 -- Test on Python 3.7-dev
* yaml/pyyaml#158 -- Support escaped slash in double quotes "\/"
* yaml/pyyaml#175 -- Updated link to pypi in release announcement
* yaml/pyyaml#181 -- Import Hashable from collections.abc
* yaml/pyyaml#194 -- Reverting yaml/pyyaml#74
* yaml/pyyaml#195 -- Build libyaml on travis
* yaml/pyyaml#196 -- Force cython when building sdist
* yaml/pyyaml#254 -- Allow to turn off sorting keys in Dumper (2)
* yaml/pyyaml#256 -- Make default_flow_style=False
* yaml/pyyaml#257 -- Deprecate yaml.load and add FullLoader and UnsafeLoader classes
* yaml/pyyaml#261 -- Skip certain unicode tests when maxunicode not > 0xffff
* yaml/pyyaml#263 -- Windows Appveyor build

Signed-off-by: Adolf Belka <adolf.belka@ipfire.org>

 --git a/config/rootfiles/packages/python3-yaml b/config/rootfiles/packages/python3-yaml
x 0870a2346..bd4009a08 100644
* yaml/pyyaml#195 -- Build libyaml on travis
* yaml/pyyaml#196 -- Force cython when building sdist
* yaml/pyyaml#254 -- Allow to turn off sorting keys in Dumper (2)
* yaml/pyyaml#256 -- Make default_flow_style=False
* yaml/pyyaml#257 -- Deprecate yaml.load and add FullLoader and Uns
oader classes
* yaml/pyyaml#261 -- Skip certain unicode tests when maxunicode not
xffff
* yaml/pyyaml#263 -- Windows Appveyor build

Signed-off-by: Adolf Belka <adolf.belka@ipfire.org>
Reviewed-by: Peter Müller <peter.mueller@ipfire.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants